Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: limitations of "generate" with missing data


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: limitations of "generate" with missing data
Date   Mon, 11 Apr 2011 23:17:38 +0100

What is r1?

On Mon, Apr 11, 2011 at 11:09 PM, Fernando Rios Avila
<f.rios.a@gmail.com> wrote:
> Hi Michael,
> The limitation is not with generate. But rather with the way u are creating
> your dummy variable
> I think this should do the trick
>
>  set obs 1000
> gen r=runiform()
>  replace  r=. if runiform()>.5
>  gen r2=r>0.7 if r1!=.
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael Costello
> Sent: Monday, April 11, 2011 6:01 PM
> To: statalist
> Subject: st: limitations of "generate" with missing data
>
> Statalisters,
>
> I recently ran into a problem with the following dataset:
>
> . tab  gread_comp_score_pcnt, m
> gread_comp_ |
>  score_pcnt |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          0 |        150        7.50        7.50
>         .2 |         85        4.25       11.75
>         .4 |         97        4.85       16.60
>         .6 |         82        4.10       20.70
>         .8 |         72        3.60       24.30
>          1 |         15        0.75       25.05
>          . |      1,499       74.95      100.00
> ------------+-----------------------------------
>      Total |      2,000      100.00
>
> The high number of "missing" is by design, a by-product of a horizontally
> structured dataset that I have yet to rectify.
>
> When I run the command:
> gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79) I am left with
>
> . tab  gread_comp_score_pcnt80, m
> gread_comp_ |
> score_pcnt8 |
>          0 |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          0 |        414       20.70       20.70
>          1 |      1,586       79.30      100.00
> ------------+-----------------------------------
>      Total |      2,000      100.00
>
> As you can see, the 87 values above .79 were set to 1, but so were all the
> missing values!!  I have toyed with the code a bit, trying variations such
> as . gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79 &
> gread_comp_score_pcnt!=.)
> but that converts all the missing to 0's, which is only marginally better.
>
> So the question is, is there some way to use a single, precise line of code
> to create eighty-seven 1's, four hundred fourteen  0's and 1499 Missing
> values in one dummy variable?  I know I can do it with several lines of
> code, but I'm looking for something more concise, as it needs to run many
> hundreds of times.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index