Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Overlapping averages

From	"Mak, Timothy" <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: RE: Overlapping averages
Date	Tue, 28 Jun 2011 14:42:28 +0000

Hi, 

I don't offer to solve your problem, but here are a few suggestions: 

1. Throughout you seem to make the assumption that measurements are without error. This could be true, but even if it is true, I wonder if it might be more helpful to think of it as imperfect measurements of something underlying. e.g. You want to know the average hours of use from day 7 to day 10. However, is that a measure of some underlying 'compliance' quality? Can we think of it as an imperfect measurement of the 'compliance' quality? Anyway, the point is, if your measurements are with error, then your criterion 1 is not necessarily sensible, and you might not have to worry about its violation. 

2. You may have to think about how to explain to your audience your choice of weights. None of the weights you've proposed below seems to have any theoretical justification (in terms of mathematics), as far as I can see, although your second option appears to have some heuristical justification -- relevance is measured by the amount of overlap time over total measurement time, and you're basically weighting by relevance... Perhaps that's the easiest option to explain to your audience. Perhaps. 

3. If you really want a weighting system with theoretical justification, I fear you might have to set up a model of some kind in terms of stochastic processes, and this might be an overkill for your purpose... 

Hope that helps. 

Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Matthew Bombyk
Sent: 28 June 2011 04:17
To: [email protected]
Subject: st: Overlapping averages

Dear Statalist,

I have a pretty long and complicated question:

My research group has data on the level of average treatment
compliance per day (average hours of use of a personal medical device)
for a number of subjects over various time intervals. So we might have
a subject that was observed from day 3 to 10 and had an average daily
usage of 5 hours over that period. We don't have the actual usage data
on a given day. We have other panel data on these subjects and want to
assign an average treatment compliance number to each day.
Many subjects have two time intervals that overlap, say one starts on
day 3 and ends day 10, the other starts day 7 and ends day 20. We have
average usage over both of these periods. I've done some algebra and
found that it is impossible to calculate the average exactly for the
overlapping time interval, 7 to 10 in our example. But, I am wondering
if there is a "best way" to incorporate the information I have into a
better estimate than just assigning one of the period's values to the
overlap, or doing a simple average?

I have a few criteria that I thought would be useful:
1) if the two intervals have the same start or end date, we CAN
calculate the overlap exactly (namely the shorter interval). So the
weighted average should collapse to this when appropriate.

2) If the intervals are the same length, we should just get a simple average.

3) if interval 1 is smaller than interval 2, interval 1 should get more weight.

I thought about using w1=t22/(t11+t22) and w2=t11/(t11+t22), where w1
is the weight on the first average, and t11 and t22 are the lengths of
the nonoverlapping portions of the first and second time intervals,
respectively, and t12 is the length of the overlap.
Period 1
   ____________
t11_____
         t12______
                    t22___________________
Period 2 _________________________
(These might turn out poorly, if so sorry, just ignore them!)

The problem with this is that if the overlap is really large compared
to the non-overlapping portions, we can put way too much weight on the
shorter interval if the nonoverlapping portions have a very high
ratio, even if they only differ absolutely by a small amount.

We also thought about using w1=t12/(t12+t11) but this fails on
criterion #1 above.

Any comments appreciated. Thanks!
Matt Bombyk
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Overlapping averages
  - From: Matthew Bombyk <[email protected]>

Prev by Date: st: xtivreg2
Next by Date: st: KPSS optimal lag length
Previous by thread: st: Overlapping averages
Next by thread: Re: st: Overlapping averages
Index(es):
- Date
- Thread