# Re: Re: st: preserving missing values in collapse (sum)

 From n j cox To statalist@hsphsun2.harvard.edu Subject Re: Re: st: preserving missing values in collapse (sum) Date Tue, 23 Oct 2007 20:12:09 +0100

This response y has not so far appeared in the drama.
Where does it come from? The same dataset?

Either way, I think you might make progress by
checking out -reshape-.

Melonie Sullivan

Okay, so far so good, thanks. But now how do I get
that information into a matrix of this form - one line
for each youth:

youthid y x1 x2 x3 x4 x5 x6
11      0 15 41  0  0  .  0
12      1  0  13 0  42 0 55

where y=dependent variable, x1=duration if group=1,
x2=duration if group=2, etc. If I take your solution,
then generate x1, x2...., and do a -list- I still get
a 6x6 matrix of x for each youth that looks like this:

youthid x1  x2  x3  x4  x5  x6
11      15   .   .   .   .   .
11      .   41   .   .   .   .
11      .    .   .   .   .   .
11      .    .   .   .   .   .
11      .    .   .   .   .   .
11      .    .   .   .   .   .

< intermediate posts>

> >> I have data on history of placements into
> different
> >> -group-s by -youthid-: there are multiple
> placement
> >> records for each youth. I need to create a
> variable
> >> equal to the sum of -duration- of all placements
> into
> >> each -group- for each youth. -collapse (sum)-
> seems to be
> >> the appropriate procedure, but it treats missing
> >> values as zeroes. This causes a problem if a
> youth has
> >> only one placement in a given group with unknown
> >> duration. Example:
> >>
> >>
> >> youthid         group        duration
> >> 11                 1            15
> >> 11                 1             .
> >> 11                 2            31
> >> 11                 2            10
> >> 11                 5             .
> >> 12                 2             5
> >> 12                 2             8
> >> 12                 4            42
> >> 12                 6            55
> >>
> >> I create a duration variable for each group
> (-generate
> >> grp1dur = duration if group==1-, etc.) and
> -collapse
> >> (sum)- by -youthid- and I want to get this:
> >>
> >> youthid   11    12
> >> grp1dur   15     0
> >> grp2dur   41    13
> >> grp3ddur   0     0
> >> grp4dur    0    42
> >> grp5dur    .     0
> >> grp6dur    0    55
> >>
> >> But collapse gives me a zero on grp5dur for youth
> #11,
> >> though youth #11 had placement in that group,
> albeit
> >> of an unknown duration. The other zeroes are
> correct;
> >> the youth had zero days in that placement group.
> >>
> >> The problem has been addressed here before, best
> in
> >> the following post by Nick Cox:
> >>
> >>
>
http://www.stata.com/statalist/archive/2004-07/msg00783.html
> >>
> >> However, this is not solving my particular
> problem,
> >> because my data essentially looks like a big
> stack of
> >> Nick's "toy datasets" -- one for each of 1800
> youth in
> >> my data. So collapsing by (youthid) gives the
> same
> >> value of Nick's allmissing for each youth, since
> the
> >> allmissing tags missing durations for groups
> within
> >> youths.

