[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Melonie Sullivan <meloniebeth@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: preserving missing values in collapse (sum) |

Date |
Mon, 22 Oct 2007 07:59:29 -0700 (PDT) |

Okay, -collapse(sum) by (youthid group) results in something like the matrix below for each youth. But I need a single record for each youth with sum of durations for each group. If I understand your solution, I then type a series of -replace x=. if allmissing- where x=grp1dur, grp2dur, etc. But then I need to collapse again to get a single sum for each group for each youth, and I am back where I started. youthid 11 11 11 11 11 11 grp1dur 15 0 0 0 0 0 grp2dur 0 41 0 0 0 0 grp3ddur 0 0 0 0 0 0 grp4dur 0 0 0 0 0 0 grp5dur 0 0 0 0 0 0 grp6dur 0 0 0 0 0 0 allmissing 0 0 0 0 1 0 --- n j cox <n.j.cox@durham.ac.uk> wrote: > I don't think you mean what you say. Your output > requires a -collapse, by(youthid group)-. Sorry > that I didn't spell that out. > > Nick > n.j.cox@durham.ac.uk > > Melonie wrote: > > Thank you, Nick. I had considered that tweak. > However, > I need to collapse by (youthid), which then gives me > a > value of allmissing for each youth, but does not tag > the missing group for replacement: > > youthid 11 12 > grp1dur 15 0 > grp2dur 41 13 > grp3ddur 0 0 > grp4dur 0 42 > grp5dur 0 0 > grp6dur 0 55 > allmissing 1 0 > > n j cox wrote: > > > Below my signature is the key part of the posting > referred to. > > > > You can tweak that approach by > > > > bysort youthid group (duration) : gen allmissing = > missing(duration[1]) > > > > and then -collapse- as before. > > > > Nick > > n.j.cox@durham.ac.uk > > > > OK. Consider this toy dataset: > > > > . l > > > > +------------------+ > > | response group | > > |------------------| > > 1. | 1 1 | > > 2. | 2 1 | > > 3. | 3 2 | > > 4. | . 2 | > > 5. | 0 3 | > > |------------------| > > 6. | . 3 | > > 7. | . 4 | > > 8. | . 4 | > > +------------------+ > > > > I'd regard 0 as a sensible sum for group > > 3 but not for group 4, but as you say > > > > . collapse (sum) response, by(group) > > > > . l > > > > +------------------+ > > | group response | > > |------------------| > > 1. | 1 3 | > > 2. | 2 3 | > > 3. | 3 0 | > > 4. | 4 0 | > > +------------------+ > > > > One way round this is to tag groups > > for which _all_ values are missing. > > One way to do that is > > > > . bysort group (response) : gen allmissing = > mi(response[1]) > > > > That is, within groups, sort on response. If the > first > > value within each group is missing, then all of > them > > in that group must be. > > > > The principle here is close to one explained at > > http://www.stata.com/support/faqs/data/diff.html > > You might also like to look at > > http://www.stata.com/support/faqs/data/anyall.html > > > > . l > > > > +-----------------------------+ > > | response group allmis~g | > > |-----------------------------| > > 1. | 1 1 0 | > > 2. | 2 1 0 | > > 3. | 3 2 0 | > > 4. | . 2 0 | > > 5. | 0 3 0 | > > |-----------------------------| > > 6. | . 3 0 | > > 7. | . 4 1 | > > 8. | . 4 1 | > > +-----------------------------+ > > > > Now we make sure the -collapse- carries along > > the important information: > > > > . collapse (sum) response (min) allmissing, > by(group) > > > > . l > > > > +-----------------------------+ > > | group response allmis~g | > > |-----------------------------| > > 1. | 1 3 0 | > > 2. | 2 3 0 | > > 3. | 3 0 0 | > > 4. | 4 0 1 | > > +-----------------------------+ > > > > and convert back: > > > > . replace response = . if allmissing > > (1 real change made, 1 to missing) > > > > . l > > > > +-----------------------------+ > > | group response allmis~g | > > |-----------------------------| > > 1. | 1 3 0 | > > 2. | 2 3 0 | > > 3. | 3 0 0 | > > 4. | 4 . 1 | > > +-----------------------------+ > > > > Melonie Sullivan > > > > I have data on history of placements into > different > > -group-s by -youthid-: there are multiple > placement > > records for each youth. I need to create a > variable > > equal to the sum of -duration- of all placements > into > > each -group- for each youth. -collapse (sum)- > seems to be > > the appropriate procedure, but it treats missing > > values as zeroes. This causes a problem if a youth > has > > only one placement in a given group with unknown > > duration. Example: > > > > > > youthid group duration > > 11 1 15 > > 11 1 . > > 11 2 31 > > 11 2 10 > > 11 5 . > > 12 2 5 > > 12 2 8 > > 12 4 42 > > 12 6 55 > > > > I create a duration variable for each group > (-generate > > grp1dur = duration if group==1-, etc.) and > -collapse > > (sum)- by -youthid- and I want to get this: > > > > youthid 11 12 > > grp1dur 15 0 > > grp2dur 41 13 > > grp3ddur 0 0 > > grp4dur 0 42 > > grp5dur . 0 > > grp6dur 0 55 > > > > But collapse gives me a zero on grp5dur for youth > #11, > > though youth #11 had placement in that group, > albeit > > of an unknown duration. The other zeroes are > correct; > > the youth had zero days in that placement group. > > > > The problem has been addressed here before, best > in > > the following post by Nick Cox: > > > > > http://www.stata.com/statalist/archive/2004-07/msg00783.html > > > > However, this is not solving my particular > problem, > > because my data essentially looks like a big stack > of > > Nick's "toy datasets" -- one for each of 1800 > youth in > > my data. So collapsing by (youthid) gives the same > > value of Nick's allmissing for each youth, since > the > > allmissing tags missing durations for groups > within > > youths. > > > > > > > * > * For searches and help try: > * > http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: preserving missing values in collapse (sum)***From:*n j cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**R: st: How to perform a productory** - Next by Date:
**Re: st: preserving missing values in collapse (sum)** - Previous by thread:
**Re: st: preserving missing values in collapse (sum)** - Next by thread:
**Re: st: preserving missing values in collapse (sum)** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |