Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Collapse & Missing Values


From   "Eric G. Wruck" <ewruck@econalytics.com>
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: Collapse & Missing Values
Date   Wed, 28 Sep 2005 23:01:14 -0400

Thanks, Nick for your solution below.  Will head off to bed & maybe read a bit on egen tag before nodding off.

Eric


>bysort group : egen nonmiss = total(myvar < .)
>by group: egen total = total(myvar)
>replace total = . if nonmiss == 0
>egen tag = tag(group)
>corr <whatever> if tag
>
>Nick
>n.j.cox@durham.ac.uk
>
>Eric G. Wruck
> 
>> Thank you Nick for your valiant effort to characterize the
>> treatment of missings as a feature.  And thank you,
>> Friedrich, for your work-around (& again to you Nick for your
>> help on that too).
>>
>> Let me just try to explain why this wasn't a feature for me
>> today.  Using the collapse statement, I was aggregating
>> various amount fields by day.  There could be multiple (and
>> usually were) transactions per day.  Once I had the
>> aggregated amounts, I was interested in their correlations,
>> especially the correlation of one amount with the lagged
>> amount of another.  When I start introducing erroneous zero
>> amounts, my correlations will not be unbiased, & certainly
>> not correct.  In fact, the way I discovered this is that one
>> colleague was computing the same correlations in SAS.  For
>> some reason, I had more observations than he.  I now know
>> that my "extra" observations were the result of collapse's
>> treatment of missing values.  I was able to get the same
>> correlations as my colleague by deleting the observations
>> with missing amounts but then I also lose the information on
>> the number of transactions on those days (albeit with
>> incomplete data).  So yes, I emphatically agree with your d
>>  iagnosis:
>>
>> >I guess what Eric would in effect like Stata to do
>> >is to keep track of all the occurrences of
>> >missing so that -sum()- would produce say
>> >
>> >. + . + . + . + . + . + 42 = 42
>> >
>> >but
>> >
>> >. + . + . + . + . + . + . = .
>> >
>> >Thus, at the end of a set that were all missing,
>> >-sum()- would be morally compelled to say,
>> >"No, that initial guess of 0 doesn't apply here.
>> >These values are all missing, so the sum must
>> >be missing. I changed my mind!"  
>>
>>
>> Failing such a radical change to collapse, perhaps there
>> could be an "allmiss" parameter that would make the sum of
>> totally missing values equal to missing.
>
>*
>*   For searches and help try:
>*   http://www.stata.com/support/faqs/res/findit.html
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index