Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Fernando Rios Avila" <f.rios.a@gmail.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: limitations of "generate" with missing data |

Date |
Mon, 11 Apr 2011 18:09:19 -0400 |

Hi Michael, The limitation is not with generate. But rather with the way u are creating your dummy variable I think this should do the trick set obs 1000 gen r=runiform() replace r=. if runiform()>.5 gen r2=r>0.7 if r1!=. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael Costello Sent: Monday, April 11, 2011 6:01 PM To: statalist Subject: st: limitations of "generate" with missing data Statalisters, I recently ran into a problem with the following dataset: . tab gread_comp_score_pcnt, m gread_comp_ | score_pcnt | Freq. Percent Cum. ------------+----------------------------------- 0 | 150 7.50 7.50 .2 | 85 4.25 11.75 .4 | 97 4.85 16.60 .6 | 82 4.10 20.70 .8 | 72 3.60 24.30 1 | 15 0.75 25.05 . | 1,499 74.95 100.00 ------------+----------------------------------- Total | 2,000 100.00 The high number of "missing" is by design, a by-product of a horizontally structured dataset that I have yet to rectify. When I run the command: gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79) I am left with . tab gread_comp_score_pcnt80, m gread_comp_ | score_pcnt8 | 0 | Freq. Percent Cum. ------------+----------------------------------- 0 | 414 20.70 20.70 1 | 1,586 79.30 100.00 ------------+----------------------------------- Total | 2,000 100.00 As you can see, the 87 values above .79 were set to 1, but so were all the missing values!! I have toyed with the code a bit, trying variations such as . gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79 & gread_comp_score_pcnt!=.) but that converts all the missing to 0's, which is only marginally better. So the question is, is there some way to use a single, precise line of code to create eighty-seven 1's, four hundred fourteen 0's and 1499 Missing values in one dummy variable? I know I can do it with several lines of code, but I'm looking for something more concise, as it needs to run many hundreds of times. Thanks for your help, -Michael -- Michael Costello MS Candidate, Statistics 2011 202-246-1627 Linked In * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: limitations of "generate" with missing data***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: limitations of "generate" with missing data***From:*Michael Costello <michaelavcostello@gmail.com>

- Prev by Date:
**st: limitations of "generate" with missing data** - Next by Date:
**st: Fixed Effects GLS** - Previous by thread:
**st: limitations of "generate" with missing data** - Next by thread:
**Re: st: RE: limitations of "generate" with missing data** - Index(es):