n j cox <n.j.cox@durham.ac.uk>

statalist@hsphsun2.harvard.edu

Re: Re: st: preserving missing values in collapse (sum)

Tue, 23 Oct 2007 20:12:09 +0100

This response y has not so far appeared in the drama. Where does it come from? The same dataset? Either way, I think you might make progress by checking out -reshape-. Melonie Sullivan Okay, so far so good, thanks. But now how do I get that information into a matrix of this form - one line for each youth: youthid y x1 x2 x3 x4 x5 x6 11 0 15 41 0 0 . 0 12 1 0 13 0 42 0 55 where y=dependent variable, x1=duration if group=1, x2=duration if group=2, etc. If I take your solution, then generate x1, x2...., and do a -list- I still get a 6x6 matrix of x for each youth that looks like this: youthid x1 x2 x3 x4 x5 x6 11 15 . . . . . 11 . 41 . . . . 11 . . . . . . 11 . . . . . . 11 . . . . . . 11 . . . . . . < intermediate posts> > >> I have data on history of placements into > different > >> -group-s by -youthid-: there are multiple > placement > >> records for each youth. I need to create a > variable > >> equal to the sum of -duration- of all placements > into > >> each -group- for each youth. -collapse (sum)- > seems to be > >> the appropriate procedure, but it treats missing > >> values as zeroes. This causes a problem if a > youth has > >> only one placement in a given group with unknown > >> duration. Example: > >> > >> > >> youthid group duration > >> 11 1 15 > >> 11 1 . > >> 11 2 31 > >> 11 2 10 > >> 11 5 . > >> 12 2 5 > >> 12 2 8 > >> 12 4 42 > >> 12 6 55 > >> > >> I create a duration variable for each group > (-generate > >> grp1dur = duration if group==1-, etc.) and > -collapse > >> (sum)- by -youthid- and I want to get this: > >> > >> youthid 11 12 > >> grp1dur 15 0 > >> grp2dur 41 13 > >> grp3ddur 0 0 > >> grp4dur 0 42 > >> grp5dur . 0 > >> grp6dur 0 55 > >> > >> But collapse gives me a zero on grp5dur for youth > #11, > >> though youth #11 had placement in that group, > albeit > >> of an unknown duration. The other zeroes are > correct; > >> the youth had zero days in that placement group. > >> > >> The problem has been addressed here before, best > in > >> the following post by Nick Cox: > >> > >> > http://www.stata.com/statalist/archive/2004-07/msg00783.html > >> > >> However, this is not solving my particular > problem, > >> because my data essentially looks like a big > stack of > >> Nick's "toy datasets" -- one for each of 1800 > youth in > >> my data. So collapsing by (youthid) gives the > same > >> value of Nick's allmissing for each youth, since > the > >> allmissing tags missing durations for groups > within > >> youths. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

