# Re: st: Egen functions - preserving missing values?

 From "Eric G. Wruck" To statalist@hsphsun2.harvard.edu Subject Re: st: Egen functions - preserving missing values? Date Thu, 6 Oct 2005 16:32:12 -0400

```On September 28th, I posted something quite similar entitled "Collapse & Missing Values".  A few people chimed in with their thoughts & ideas.  Basically, there is not an easy way around this problem.  But here is one that I think works:

. l

+-----------------+
| family   income |
|-----------------|
1. |      1        . |
2. |      1        . |
3. |      1        . |
4. |      2    75000 |
5. |      2    87000 |
|-----------------|
6. |      2        . |
7. |      3        . |
8. |      3        . |
+-----------------+

. egen fam_inc = total(income), by(family)

. egen no_miss = total(cond(income==.,1,0)), by(family)

. egen no_nmiss = total(cond(income~=.,1,0)), by(family)

. l

+------------------------------------------------+
| family   income   fam_inc   no_miss   no_nmiss |
|------------------------------------------------|
1. |      1        .         0         3          0 |
2. |      1        .         0         3          0 |
3. |      1        .         0         3          0 |
4. |      2    75000    162000         1          2 |
5. |      2    87000    162000         1          2 |
|------------------------------------------------|
6. |      2        .    162000         1          2 |
7. |      3        .         0         2          0 |
8. |      3        .         0         2          0 |
+------------------------------------------------+

. replace fam_inc = . if fam_inc == 0 & no_miss > 0 & no_nmiss == 0
(5 real changes made, 5 to missing)

. l

+------------------------------------------------+
| family   income   fam_inc   no_miss   no_nmiss |
|------------------------------------------------|
1. |      1        .         .         3          0 |
2. |      1        .         .         3          0 |
3. |      1        .         .         3          0 |
4. |      2    75000    162000         1          2 |
5. |      2    87000    162000         1          2 |
|------------------------------------------------|
6. |      2        .    162000         1          2 |
7. |      3        .         .         2          0 |
8. |      3        .         .         2          0 |
+------------------------------------------------+

Kind of kludgy, I know.  What I'd really like to see Stata at least offer an option on collapse & egen that would not do this, but Nick Cox rather dashed my hopes on that front.  But perhaps someone can write a routine that would automate this?

Best of luck,

Eric

>Hi, all.
>
>I'm using US 2000 Census data (IPUMS version, with my edits).  I've hit upon an issue I don't find much <help> on:  how to preserve missing values when these are qualitatively different from zero values when using an <egen> function.
>
>I have individual-level income income data (inctot2) that I want to aggregate within families (famunt2) in a household (serial):
>
>egen ftoty=sum(inctot2), by(serial famunt2)
>
>The issue: ftoty is zero, even when all family members have inctot2==. (i.e., not reported, for example, due to age).  In my application (determining family income relative to a poverty threshold) zero family income is very different from nonreported family income.
>
>One work-around is to use the !missing(varname) construction, which sets ftoty to missing for any person with missing inctot2:
>
> egen ftoty=sum(inctot2) if !missing(inctot2), by(serial famunt2)
>
>The drawback to this approach is that I must go back and assign non-missing values of ftoty to individuals for whom ftoty is missing, but who live in a family where other individuals report a valid income value.
>
>Is there a better way to approach this problem?
>
>Best,  Deborah Garvey
>
>******************************
>Deborah Garvey, Ph.D.
>Department of Economics
>Kenna Hall
>Santa Clara University
>Santa Clara, CA  95053
>408/554-5580
>408/554-2331 (FAX)
>dgarvey@scu.edu
>http://lsb.scu.edu/~dgarvey
>**********************************
>
>
>This message scanned for viruses and SPAM at SCU (MGW2)
>
>*
>*   For searches and help try:
>*   http://www.stata.com/support/faqs/res/findit.html
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/

--

===================================================

Eric G. Wruck
Econalytics
Columbus, OH  43209

ph:      614.231.5034
cell:    614.330.8846
eFax:    614.573.6639
eMail:   ewruck@econalytics.com
website: http://www.econalytics.com

====================================================
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```