Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: random samples within each of 1,152 categories

From	Maarten Buis <[email protected]>
To	[email protected]
Subject	st: Re: random samples within each of 1,152 categories
Date	Tue, 14 May 2013 10:16:06 +0200

On Wed, May 8, 2013 at 6:41 PM, Olga Gorbachev wrote:
> Thank you very much for your reply and sorry for replying to you
> directly, I couldn't figure out how to reply to list serve, since I
> get digest version.

Follow up questions must be sent to the Statalist. In this case I
would just start a new question. This is not perfect but much better
than reply to someone privately.

> also, I don't subscribe to stata journal, so I couldn't read the
> article you referenced, sorry.

The article I refered to earlier has passed the moving wall, so you
can read it free of charge even if you don't subscribe, just follow
the link I gave you. (The article in question is: M.L. Buis (2007),
"Stata tip 48: Discrete uses for uniform()", The Stata Journal, 7(3),
pp. 434-435. <http://www.stata-journal.com/article.html?article=pr0032>)

> can your example work with weights? I'd like to be able to match the
> means of the distribution as well, and since it is survey data, sample
> weights are important.

For weights you need to use -collapse- to compute the weighted means,
otherwise everything remains the same. See the example below:

*------------------ begin example ------------------
// data preparation
sysuse nlsw88, clear

// create some "weights"
gen w = 1/(.2 + .6*runiform())

gen byte occat = cond(occupation < 3                 , 1,      ///
                 cond(inlist(occupation, 5, 6, 8, 13), 2, 3))  ///
                 if occupation < .
label variable occat "occupation in categories"
label define occat 1 "high"   ///
                   2 "middle" ///
                   3 "low"
label value occat occat

gen byte edcat = cond(grade <  12, 1,     ///
                 cond(grade == 12, 2, 3)) ///
                 if grade < .
label define edcat 1 "less than high school" ///
                   2 "high school"           ///
                   3 "more than high school"
label value edcat edcat
label variable edcat "education in categories"

// define the sample
gen byte touse = !missing(race, edcat, occat, married)

// create the group indicator
egen group = group(race edcat occat) if touse

tempfile temp
save `temp'

// create the proportion of married women per group
collapse (mean) married [pw=w] if touse , by(group)
merge 1:m group using `temp'
assert _merge == 2 if touse == 0
assert _merge == 3 if touse == 1
drop _merge

// sample a new married variable
gen byte married_sim = runiform() < p if touse
*------------------- end example -------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

-- Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Re: random samples within each of 1,152 categories
  - From: Nick Cox <[email protected]>

References:
- st: random samples within each of 1,152 categories
  - From: Olga Gorbachev <[email protected]>

Prev by Date: Re: st: Re: Problem with variable names using Insheet
Next by Date: st: Fwd: Simulating Multinomial Logit in Stata
Previous by thread: Re: st: random samples within each of 1,152 categories
Next by thread: Re: st: Re: random samples within each of 1,152 categories
Index(es):
- Date
- Thread