Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Egen functions - preserving missing values?


From   "Eric G. Wruck" <ewruck@econalytics.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Egen functions - preserving missing values?
Date   Thu, 6 Oct 2005 16:32:12 -0400

On September 28th, I posted something quite similar entitled "Collapse & Missing Values".  A few people chimed in with their thoughts & ideas.  Basically, there is not an easy way around this problem.  But here is one that I think works:

. l

     +-----------------+
     | family   income |
     |-----------------|
  1. |      1        . |
  2. |      1        . |
  3. |      1        . |
  4. |      2    75000 |
  5. |      2    87000 |
     |-----------------|
  6. |      2        . |
  7. |      3        . |
  8. |      3        . |
     +-----------------+

. egen fam_inc = total(income), by(family)

. egen no_miss = total(cond(income==.,1,0)), by(family)

. egen no_nmiss = total(cond(income~=.,1,0)), by(family)

. l

     +------------------------------------------------+
     | family   income   fam_inc   no_miss   no_nmiss |
     |------------------------------------------------|
  1. |      1        .         0         3          0 |
  2. |      1        .         0         3          0 |
  3. |      1        .         0         3          0 |
  4. |      2    75000    162000         1          2 |
  5. |      2    87000    162000         1          2 |
     |------------------------------------------------|
  6. |      2        .    162000         1          2 |
  7. |      3        .         0         2          0 |
  8. |      3        .         0         2          0 |
     +------------------------------------------------+


. replace fam_inc = . if fam_inc == 0 & no_miss > 0 & no_nmiss == 0
(5 real changes made, 5 to missing)

. l

     +------------------------------------------------+
     | family   income   fam_inc   no_miss   no_nmiss |
     |------------------------------------------------|
  1. |      1        .         .         3          0 |
  2. |      1        .         .         3          0 |
  3. |      1        .         .         3          0 |
  4. |      2    75000    162000         1          2 |
  5. |      2    87000    162000         1          2 |
     |------------------------------------------------|
  6. |      2        .    162000         1          2 |
  7. |      3        .         .         2          0 |
  8. |      3        .         .         2          0 |
     +------------------------------------------------+


Kind of kludgy, I know.  What I'd really like to see Stata at least offer an option on collapse & egen that would not do this, but Nick Cox rather dashed my hopes on that front.  But perhaps someone can write a routine that would automate this?

Best of luck,

Eric
 


>Hi, all.
>
>I'm using US 2000 Census data (IPUMS version, with my edits).  I've hit upon an issue I don't find much <help> on:  how to preserve missing values when these are qualitatively different from zero values when using an <egen> function.
>
>I have individual-level income income data (inctot2) that I want to aggregate within families (famunt2) in a household (serial):
>
>egen ftoty=sum(inctot2), by(serial famunt2)
>
>The issue: ftoty is zero, even when all family members have inctot2==. (i.e., not reported, for example, due to age).  In my application (determining family income relative to a poverty threshold) zero family income is very different from nonreported family income.
>
>One work-around is to use the !missing(varname) construction, which sets ftoty to missing for any person with missing inctot2:
>
> egen ftoty=sum(inctot2) if !missing(inctot2), by(serial famunt2)
>
>The drawback to this approach is that I must go back and assign non-missing values of ftoty to individuals for whom ftoty is missing, but who live in a family where other individuals report a valid income value.
>
>Is there a better way to approach this problem? 
>
>Best,  Deborah Garvey
>
>******************************
>Deborah Garvey, Ph.D.
>Department of Economics
>Kenna Hall
>Santa Clara University
>Santa Clara, CA  95053
>408/554-5580
>408/554-2331 (FAX)
>dgarvey@scu.edu   
>http://lsb.scu.edu/~dgarvey
>**********************************
>
>
>This message scanned for viruses and SPAM at SCU (MGW2)
>
>*
>*   For searches and help try:
>*   http://www.stata.com/support/faqs/res/findit.html
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/


-- 

===================================================

       Eric G. Wruck
       Econalytics
       2535 Sherwood Road
       Columbus, OH  43209

       ph:      614.231.5034
       cell:    614.330.8846
       eFax:    614.573.6639
       eMail:   ewruck@econalytics.com
       website: http://www.econalytics.com

====================================================
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index