Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: preserving missing values in collapse (sum)


From   Melonie Sullivan <meloniebeth@yahoo.com>
To   stata listserve <statalist@hsphsun2.harvard.edu>
Subject   st: preserving missing values in collapse (sum)
Date   Mon, 22 Oct 2007 05:51:37 -0700 (PDT)

Hi, I'm new to the list. Thanks for your attention. 

I have data on history of placements into different
groups by youthid, there are multiple placement
records for each youth. I need to create a variable
equal to the sum of durations of all placements into
each group for each youth. Collapse (sum) seems to be
the appropriate procedure, but it treats missing
values as zeroes. This causes a problem if a youth has
only one placement in a given group with unknown
duration. Example:
         
         
youthid         group        duration      
11                 1            15             
11                 1             .               
11                 2            31             
11                 2            10             
11                 5             .               
12                 2             5              
12                 2             8             
12                 4            42            
12                 6            55             
         
I create a duration variable for each group (generate
grp1dur = duration if group==1, etc.) and collapse
(sum) by youthid and I want to get this:
        
youthid   11    12
grp1dur   15     0
grp2dur   41    13 
grp3ddur   0     0  
grp4dur    0    42
grp5dur    .     0
grp6dur    0    55
         
But collapse gives me a zero on grp5dur for youth #11,
though youth #11 had placement in that group, albeit
of an unknown duration. The other zeroes are correct;
the youth had zero days in that placement group. 

The problem has been addressed here before, best in
the following post by Nick Cox:
         
http://www.stata.com/statalist/archive/2004-07/msg00783.html
        
However, this is not solving my particular problem,
because my data essentially looks like a big stack of
Nick's "toy datasets" -- one for each of 1800 youth in
my data. So collapsing by (youthid) gives the same
value of Nick's allmissing for each youth, since the
allmissing tags missing durations for groups within
youths.

I hope this is clear, and thanks in advance for any
assistance.
      
Melonie Sullivan
Director of Research and Program Evaluation
Institute for Family Centered Services, Inc.
      __________________________________________________

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index