[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Reshape-Like Question

From   "Nick Cox" <>
To   <>
Subject   RE: st: Reshape-Like Question
Date   Wed, 4 Mar 2009 17:38:04 -0000

I see. I'll stick to my stance and discuss how to get measures working from your long structure. 

The two most obviously tricky details in your problem are 

1. The stipulation of non-consecutive days. (This presumably arises because consecutive days are thought likely to be dependent.) 

2. The use of 30 day periods when data are likely to be at least a little irregular in time. 

I'll focus on period means, each of daily means for blood glucose. You want something else, but at best the "something else" is not your major problem", but rather the two features singled out above. 

First, I get those daily means and count how many measurements they are based on 

bysort pid datestamp : gen mean = sum(bglevel) 
by pid datestamp : gen N = sum(bglevel < .) 
by pid datestamp : replace mean = mean[_N] / N[_N] 

Now keep one observation for each day. We keep the _last_ for each day as that contains not just the mean -mean- but also the number of measurements -N-. 

by pid datestamp : keep if _n == _N 

Now drop consecutive days, interpreted as any day following another day: 

by pid : drop if datestamp == datestamp[_n-1] + 1 

Here is a brute force way of averaging over the previous 30 days. We keep track of how many days each average is based on and how many of those days included 3 or more measurements. 
The technique is written up within 

SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in intervals
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q3/07   SJ 7(3):440--443                                 (no commands)
        tip for counting or summarizing irregularly spaced
        events in intervals

gen mean = . 
gen Ndays = . 
gen Nge3days = . 

qui forval i = 1/`=_N' { 
	su mean if N >= 3 &  inrange(date[`i'] - date, 1, 30) ///
      & pid == pid[`i'] 
	replace mean = r(mean) in `i' 

	count if inrange(date[`i'] - date, 1, 30) & pid == pid[`i']
	replace Ndays = r(N) in `i' 

	count if N >= 3 & inrange(date[`i' - date, 1, 30) & pid == pid[`i'] 
	replace Nge3days = r(N) 

At its broadest, the idea is mundane: 

Initialise variables to be calculated 

Loop over observations { 
	-count- or calculate whatever is of interest 
		for observations in the same panel 
		within a specified time interval 

	-replace- variables with results obtained 


Alan Neustadtl thanks to Scott and Nick by pointing out the crucial fact of
creating a unique identifier.  I tried that, but incorrectly specified
the reshape, then dropped the id and incorrectly specified the
reshape.  I had all the pieces to the puzzle but couldn't put them
together until I was pointed in the correct direction.

As for Nick's other comments, he is probably right that it may be
possible to work column wise on the data and my limits might be in
seeing the big picture.

What I am trying to do is create a symmetrical measure of blood
glucose variability called the "average daily risk range" (ADRR).  The
measure requires that each participant has a minimum of three blood
glucose readings for at least 14 nonconsecutive days of readings in a
30 day period.

Using rowwise egen commands gave me some leverage on identifying the
relevant patients.  I tried using -by- and -collapse- to come to the
same place (one risk range measure per patient/day) but eventually
became lost in the details and worked this into a -reshape- problem.
I am open to learning new things so please if you have time I would
appreciate other attacks on my problem.

On Wed, Mar 4, 2009 at 8:51 AM, Nick Cox <> wrote:
> Scott's example code underlines that this is indeed a -reshape- problem and that you just need the one trick of creating an identifier that will be used for the columns.
> Other trickery in this territory is detailed at
> FAQ     . . . . . . . . . . . . . . . . . . . . . . . .  Problems with reshape
>        12/03   I am having problems with the reshape command. Can
>                you give further guidance?
> But a bigger question is why you want to do this. On the whole you are better off with your existing data structure. Although working rowwise is possible and often natural, as will be explored in some detail in Stata Journal 9(1) 2009, it is difficult to think of anything easier with your new data structure.

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index