st: RE: RE: RE: collapse missing values into zeroes when using (sum)

 From "Nick Cox" To Subject st: RE: RE: RE: collapse missing values into zeroes when using (sum) Date Wed, 28 Jul 2004 01:41:04 +0100

```OK. Consider this toy dataset:

. l

+------------------+
| response   group |
|------------------|
1. |        1       1 |
2. |        2       1 |
3. |        3       2 |
4. |        .       2 |
5. |        0       3 |
|------------------|
6. |        .       3 |
7. |        .       4 |
8. |        .       4 |
+------------------+

I'd regard 0 as a sensible sum for group
3 but not for group 4, but as you say

. collapse (sum) response, by(group)

. l

+------------------+
| group   response |
|------------------|
1. |     1          3 |
2. |     2          3 |
3. |     3          0 |
4. |     4          0 |
+------------------+

One way round this is to tag groups
for which _all_ values are missing.
One way to do that is

. bysort group (response) : gen allmissing = mi(response[1])

That is, within groups, sort on response. If the first
value within each group is missing, then all of them
in that group must be.

The principle here is close to one explained at
http://www.stata.com/support/faqs/data/diff.html
You might also like to look at
http://www.stata.com/support/faqs/data/anyall.html

. l

+-----------------------------+
| response   group   allmis~g |
|-----------------------------|
1. |        1       1          0 |
2. |        2       1          0 |
3. |        3       2          0 |
4. |        .       2          0 |
5. |        0       3          0 |
|-----------------------------|
6. |        .       3          0 |
7. |        .       4          1 |
8. |        .       4          1 |
+-----------------------------+

Now we make sure the -collapse- carries along
the important information:

. collapse (sum) response (min) allmissing, by(group)

. l

+-----------------------------+
| group   response   allmis~g |
|-----------------------------|
1. |     1          3          0 |
2. |     2          3          0 |
3. |     3          0          0 |
4. |     4          0          1 |
+-----------------------------+

and convert back:

. replace response = . if allmissing
(1 real change made, 1 to missing)

. l

+-----------------------------+
| group   response   allmis~g |
|-----------------------------|
1. |     1          3          0 |
2. |     2          3          0 |
3. |     3          0          0 |
4. |     4          .          1 |
+-----------------------------+

Nick
n.j.cox@durham.ac.uk

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Monica L.
> Sent: 28 July 2004 01:24
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: RE: collapse missing values into zeroes when using
> (sum)
>
>
> I would like to keep them as missing because I need to be able to
> distinguish between actual 0s and missing values.
> Thanks,
> Monica L.
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Tuesday, July 27, 2004 7:51 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: collapse missing values into zeroes when using (sum)
>
>
> You say that you don't want missings to be
> included in the -collapse-, as they are
> regarded as summing to 0.
>
> What do you want Stata to do instead?
>
> Nick
> n.j.cox@durham.ac.uk
>
>
> > I have information at cities and state level but want to
> > aggregate it at the
> > state level using weighted averages (in a very manual way).
> > For one of the
> > steps I am using the command "collapse (sum) variables, by
> > (state year)" but
> > then I am getting zeroes where there were missing values
> > because of the sum
> > function (it does not happen with the other collapse
> > functions like sd,
> > mean, etc).  I tried to use the option "cw" but since the
> > missing values are
> > not for the same observation among the different variables I
> > end up having
> > no observations at all.
> > So I would like to know if there is a way I could tell Stata
> > not to collapse
> > missing values into zeroes when using (sum).
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```