Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Collapse & Missing Values


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Collapse & Missing Values
Date   Thu, 29 Sep 2005 01:54:08 +0100

bysort group : egen nonmiss = total(myvar < .)
by group: egen total = total(myvar) 
replace total = . if nonmiss == 0 
egen tag = tag(group) 
corr <whatever> if tag 

Nick 
n.j.cox@durham.ac.uk 

Eric G. Wruck
 
> Thank you Nick for your valiant effort to characterize the 
> treatment of missings as a feature.  And thank you, 
> Friedrich, for your work-around (& again to you Nick for your 
> help on that too).
> 
> Let me just try to explain why this wasn't a feature for me 
> today.  Using the collapse statement, I was aggregating 
> various amount fields by day.  There could be multiple (and 
> usually were) transactions per day.  Once I had the 
> aggregated amounts, I was interested in their correlations, 
> especially the correlation of one amount with the lagged 
> amount of another.  When I start introducing erroneous zero 
> amounts, my correlations will not be unbiased, & certainly 
> not correct.  In fact, the way I discovered this is that one 
> colleague was computing the same correlations in SAS.  For 
> some reason, I had more observations than he.  I now know 
> that my "extra" observations were the result of collapse's 
> treatment of missing values.  I was able to get the same 
> correlations as my colleague by deleting the observations 
> with missing amounts but then I also lose the information on 
> the number of transactions on those days (albeit with 
> incomplete data).  So yes, I emphatically agree with your d
>  iagnosis:
> 
> >I guess what Eric would in effect like Stata to do
> >is to keep track of all the occurrences of
> >missing so that -sum()- would produce say
> >
> >. + . + . + . + . + . + 42 = 42
> >
> >but
> >
> >. + . + . + . + . + . + . = .
> >
> >Thus, at the end of a set that were all missing,
> >-sum()- would be morally compelled to say,
> >"No, that initial guess of 0 doesn't apply here.
> >These values are all missing, so the sum must
> >be missing. I changed my mind!"   
> 
> 
> Failing such a radical change to collapse, perhaps there 
> could be an "allmiss" parameter that would make the sum of 
> totally missing values equal to missing.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index