Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: replacement task


From   Richard Williams <[email protected]>
To   [email protected]
Subject   Re: st: replacement task
Date   Wed, 21 Jan 2004 18:29:43 -0500

At 03:54 PM 1/21/2004 -0600, Dimitriy V. Masterov wrote:
I could not find a command that performs this simple task. I was hoping if
someone might have some advice. I have a variable named emp that looks
like this

emp     Freq.
0       93,404
1       55,537
2       36,556

I want to replace emp=2 at random with 0 or 1 depending on the relative
number of 1s and 0s in the sample. My outcome should look like this

emp     Freq.
0       116,329
1       69,168
A very similar question came up not too long ago. This weird looking command will get you close to the above split but not quite (and you'll get slightly different results than I show here):

. replace emp=uniform() <=55537/148941 if emp == 2

(36556 real changes made)

. tab emp

emp | Freq. Percent Cum.
------------+-----------------------------------
0 | 116,387 62.74 62.74
1 | 69,110 37.26 100.00
------------+-----------------------------------
Total | 185,497 100.00


Here is the logic: the uniform fnc generates a uniformly distributed var that ranges from 0 to 1. Because of the uniform distribution, about 37.29% of the time the uniformly distributed var will be less than (you guessed it) .3729 (i.e. 55,537/148,941). The statement will be evaluated as true in such cases and emp will be assigned 1. The rest of the time it will be false and emp will get assigned 0. The if emp == 2 will limit the replacement to when emp = 2. Because it is random, you won't get exactly the final outcome you described but it will be close.

Now, if you absolutely positively have to get the exact split you describe above, try something like this:

. gen id = _n

. gen x = uniform() if emp==2

(148941 missing values generated)

. sort x

. replace emp = 0 in 1/22925
(22925 real changes made)

. replace emp = 1 in 22926/36556
(13631 real changes made)

. sort id

. tab emp

emp | Freq. Percent Cum.
------------+-----------------------------------
0 | 116,329 62.71 62.71
1 | 69,168 37.29 100.00
------------+-----------------------------------
Total | 185,497 100.00


What you are doing is generating a random variable (no particular reason it has to have a uniform distribution, by the way), sorting on the variable, and then recoding the first 22925 cases to 0 on emp and the next 13631 cases to 1. (The lines involving id are necessary only if you want to restore your original sort order and have no other var that will let you do so).


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: [email protected]
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index