Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Egen functions - preserving missing values?


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Egen functions - preserving missing values?
Date   Thu, 6 Oct 2005 22:13:05 +0100

It seems to me a bit more fuss is being made
about this than is warranted. 

First off, you can count non-missing values by 

. egen nonmiss = count(income), by(family) 

Any family with all missing values will have
values that are all 0 on this variable. Any family with at 
least one missing value will have values that are 
positive. So, the division between 0 and positive 
is a safe one. 

Therefore, you can always go 

... if nonmiss 

and safely exclude all the families with no data from any 
summary or comparison. 

In addition, if any analysis produces spurious zeros, 
they can be knocked back into missing land by 

replace ... = . if !nonmiss 

Nick 
[email protected] 

Eric G. Wruck
 
> On September 28th, I posted something quite similar entitled 
> "Collapse & Missing Values".  A few people chimed in with 
> their thoughts & ideas.  Basically, there is not an easy way 
> around this problem.  But here is one that I think works:
> 
> . l
> 
>      +-----------------+
>      | family   income |
>      |-----------------|
>   1. |      1        . |
>   2. |      1        . |
>   3. |      1        . |
>   4. |      2    75000 |
>   5. |      2    87000 |
>      |-----------------|
>   6. |      2        . |
>   7. |      3        . |
>   8. |      3        . |
>      +-----------------+
> 
> . egen fam_inc = total(income), by(family)
> 
> . egen no_miss = total(cond(income==.,1,0)), by(family)
> 
> . egen no_nmiss = total(cond(income~=.,1,0)), by(family)
> 
> . l
> 
>      +------------------------------------------------+
>      | family   income   fam_inc   no_miss   no_nmiss |
>      |------------------------------------------------|
>   1. |      1        .         0         3          0 |
>   2. |      1        .         0         3          0 |
>   3. |      1        .         0         3          0 |
>   4. |      2    75000    162000         1          2 |
>   5. |      2    87000    162000         1          2 |
>      |------------------------------------------------|
>   6. |      2        .    162000         1          2 |
>   7. |      3        .         0         2          0 |
>   8. |      3        .         0         2          0 |
>      +------------------------------------------------+
> 
> 
> . replace fam_inc = . if fam_inc == 0 & no_miss > 0 & no_nmiss == 0
> (5 real changes made, 5 to missing)
> 
> . l
> 
>      +------------------------------------------------+
>      | family   income   fam_inc   no_miss   no_nmiss |
>      |------------------------------------------------|
>   1. |      1        .         .         3          0 |
>   2. |      1        .         .         3          0 |
>   3. |      1        .         .         3          0 |
>   4. |      2    75000    162000         1          2 |
>   5. |      2    87000    162000         1          2 |
>      |------------------------------------------------|
>   6. |      2        .    162000         1          2 |
>   7. |      3        .         .         2          0 |
>   8. |      3        .         .         2          0 |
>      +------------------------------------------------+
> 
> 
> Kind of kludgy, I know.  What I'd really like to see Stata at 
> least offer an option on collapse & egen that would not do 
> this, but Nick Cox rather dashed my hopes on that front.  But 
> perhaps someone can write a routine that would automate this?
 
Deborah Garvey  

> >I'm using US 2000 Census data (IPUMS version, with my 
> edits).  I've hit upon an issue I don't find much <help> on:  
> how to preserve missing values when these are qualitatively 
> different from zero values when using an <egen> function.
> >
> >I have individual-level income income data (inctot2) that I 
> want to aggregate within families (famunt2) in a household (serial):
> >
> >egen ftoty=sum(inctot2), by(serial famunt2)
> >
> >The issue: ftoty is zero, even when all family members have 
> inctot2==. (i.e., not reported, for example, due to age).  In 
> my application (determining family income relative to a 
> poverty threshold) zero family income is very different from 
> nonreported family income.
> >
> >One work-around is to use the !missing(varname) 
> construction, which sets ftoty to missing for any person with 
> missing inctot2:
> >
> > egen ftoty=sum(inctot2) if !missing(inctot2), by(serial famunt2)
> >
> >The drawback to this approach is that I must go back and 
> assign non-missing values of ftoty to individuals for whom 
> ftoty is missing, but who live in a family where other 
> individuals report a valid income value.
> >
> >Is there a better way to approach this problem? 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index