Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Overlapping averages |

Date |
Tue, 28 Jun 2011 11:02:50 -0400 |

Matthew Bombyk <bomby001@umn.edu>: You have missing data (hours of use of a personal medical device per day) for the sample of people without intervals that overlap in exactly the right way to uniquely identify a day's usage, but each (interval or) day has a feasible range for (average) daily usage that is consistent with the interval data--why not use -mi- or -ice- (Royston 2004, 2007, etc.) to impute a feasible value for each day? Are there cases where a single day's usage is identifiable, so that you have a sense of the true distribution of daily usage conditional on other variables? What are you going to do with these average daily values if you compute them as weighted means--are they going into a daily panel data model? In that case, you will understate variance by imputing the mean, at the very least. Whatever you decide to do, I recommend doing a little simulation to see how your final model performs on data you have constructed yourself (so you know the truth) and then artificially censored to reproduce your data structure. Royston P. 2004. Multiple imputation of missing values. Stata Journal 4(3):227-241. Royston P. 2007. Multiple imputation of missing values: further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445-464. On Mon, Jun 27, 2011 at 11:16 PM, Matthew Bombyk <bomby001@umn.edu> wrote: > Dear Statalist, > > I have a pretty long and complicated question: > > My research group has data on the level of average treatment > compliance per day (average hours of use of a personal medical device) > for a number of subjects over various time intervals. So we might have > a subject that was observed from day 3 to 10 and had an average daily > usage of 5 hours over that period. We don't have the actual usage data > on a given day. We have other panel data on these subjects and want to > assign an average treatment compliance number to each day. > Many subjects have two time intervals that overlap, say one starts on > day 3 and ends day 10, the other starts day 7 and ends day 20. We have > average usage over both of these periods. I've done some algebra and > found that it is impossible to calculate the average exactly for the > overlapping time interval, 7 to 10 in our example. But, I am wondering > if there is a "best way" to incorporate the information I have into a > better estimate than just assigning one of the period's values to the > overlap, or doing a simple average? > > I have a few criteria that I thought would be useful: > 1) if the two intervals have the same start or end date, we CAN > calculate the overlap exactly (namely the shorter interval). So the > weighted average should collapse to this when appropriate. > > 2) If the intervals are the same length, we should just get a simple average. > > 3) if interval 1 is smaller than interval 2, interval 1 should get more weight. > > I thought about using w1=t22/(t11+t22) and w2=t11/(t11+t22), where w1 > is the weight on the first average, and t11 and t22 are the lengths of > the nonoverlapping portions of the first and second time intervals, > respectively, and t12 is the length of the overlap. > Period 1 > ____________ > t11_____ > t12______ > t22___________________ > Period 2 _________________________ > (These might turn out poorly, if so sorry, just ignore them!) > > The problem with this is that if the overlap is really large compared > to the non-overlapping portions, we can put way too much weight on the > shorter interval if the nonoverlapping portions have a very high > ratio, even if they only differ absolutely by a small amount. > > We also thought about using w1=t12/(t12+t11) but this fails on > criterion #1 above. > > Any comments appreciated. Thanks! > Matt Bombyk * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Overlapping averages***From:*Matthew Bombyk <bomby001@umn.edu>

- Prev by Date:
**st: bootstrapping different models using the same samples** - Next by Date:
**Re: st: bootstrapping different models using the same samples** - Previous by thread:
**st: RE: Overlapping averages** - Next by thread:
**st: stability test for Error Correction Models and forecast on stata** - Index(es):