From
Nick Cox <njcoxstata@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject
Re: st: limitations of "generate" with missing data

Date
Mon, 11 Apr 2011 23:19:51 +0100

Add ) at end in #3. On Mon, Apr 11, 2011 at 11:15 PM, Nick Cox <njcoxstata@gmail.com> wrote: > The underlying problem can be illustrated by sorting. Suppose we > -sort- a variable, which contains missings, in numeric order. Where do > the missings go? We need a decision: either missing is regarded as > larger than any non-missing, or smaller than any non-missing. Stata > made the first decision. > > Any way, here are some solutions: > > gen myvar1 = (gread_comp_score_pcnt>.79) if gread_comp_pcnt < . > > gen myvar2 = (gread_comp_score_pcnt>.79) if !missing(gread_comp_pcnt) > > gen myvar3 = cond(missing(gread_comp_pcnt), ., (gread_comp_score_pcnt > .79) > > gen myvar4 = (gread_comp_score_pcnt > .79) / (!missing(gread_comp_pcnt)) > > (5. don't throw away information by turning a measure into an indicator!) > > Nick > > On Mon, Apr 11, 2011 at 11:01 PM, Michael Costello > <michaelavcostello@gmail.com> wrote: >> Statalisters, >> >> I recently ran into a problem with the following dataset: >> >> . tab gread_comp_score_pcnt, m >> gread_comp_ | >> score_pcnt | Freq. Percent Cum. >> ------------+----------------------------------- >> 0 | 150 7.50 7.50 >> .2 | 85 4.25 11.75 >> .4 | 97 4.85 16.60 >> .6 | 82 4.10 20.70 >> .8 | 72 3.60 24.30 >> 1 | 15 0.75 25.05 >> . | 1,499 74.95 100.00 >> ------------+----------------------------------- >> Total | 2,000 100.00 >> >> The high number of "missing" is by design, a by-product of a >> horizontally structured dataset that I have yet to rectify. >> >> When I run the command: >> gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79) >> I am left with >> >> . tab gread_comp_score_pcnt80, m >> gread_comp_ | >> score_pcnt8 | >> 0 | Freq. Percent Cum. >> ------------+----------------------------------- >> 0 | 414 20.70 20.70 >> 1 | 1,586 79.30 100.00 >> ------------+----------------------------------- >> Total | 2,000 100.00 >> >> As you can see, the 87 values above .79 were set to 1, but so were all >> the missing values!! I have toyed with the code a bit, trying >> variations such as >> . gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79 & >> gread_comp_score_pcnt!=.) >> but that converts all the missing to 0's, which is only marginally better. >> >> So the question is, is there some way to use a single, precise line of >> code to create eighty-seven 1's, four hundred fourteen 0's and 1499 >> Missing values in one dummy variable? I know I can do it with several >> lines of code, but I'm looking for something more concise, as it needs >> to run many hundreds of times. >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

