Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: coding problem: looping through a list of ID's


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: coding problem: looping through a list of ID's
Date   Tue, 25 Nov 2008 15:59:43 -0000

This has been overtaken somewhat by later postings, but a few comments: 

Somewhat like Jeph, I think I understood Leny's previous posting. Unlike
Jeph, I think my solution does what Leny asked for. Whether that's best
is another matter. 

Jeph's solution appeals as making most use of the data. In practice
growth rates calculated from close measurements will be noisy, and I'm
not sure a  mean of growth rates is going to be the best bet. In
principle a harmonic mean is preferable, although that will choke on any
growth rates measured as zero. Or you average just by measuring growth
over longer spans, as Leny initially signalled. 

You might find the two solutions complementary, rather than
contradictory. 

Nick 
n.j.cox@durham.ac.uk 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jeph Herrin
Sent: 24 November 2008 22:02
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: coding problem: looping through a list of ID's

I think my solution does what you want, but better in that
the best approximation to the increase in height per 365 days
is the average increase/day between visits, multiplied by
365.

Stata stores the dates of visit as consecutive integers counting
number of days. Thus, the difference in height between visits
divided by the difference in dates will be the change in height
per day between those visits. Do this for each visit, and then
average over visits. It only needs two lines of code, per my
solution, though to get in years you will want to multiply by
the number of days in a year.

   bysort ID (dov) : gen rate=(height-height[_n-1])/(dov-dov[_n-1])
   bysort ID       : egen average=mean(rate*365.25)


If you truly want to ignore the information in the visits that
are less than a year apart, and only look at the growth between
visits that are "closest" to one year from the first visit, try
this:

  * put date in years
  gen year=dov/365.25

  * get gap to nearest whole years from first visit
  bysort ID (dov) : gen gap = abs(round(year-year[1])-(year-year[1]))

  * use the visit if closer than neighbor visits to one year
  bysort ID (dov) : gen use = gap<gap[_n-1]&gap<=gap[_n+1]

where I've made the last inequality soft to deal with potential ties.
Now just keep the ones you're using, and apply solution above, only
now the denominator is assumed to be 1, though of course that's just
a rough approximation:

  keep if use
  bysort ID (dov) : gen rate = height-height[1]
  bysort ID       : egen average=mean(rate)

where the second line gives you the average for the subject over
all of their years.

hope this helps,
Jeph


Leny Mathew wrote:
> Thanks Nick & Jeff for your suggestions. I think that I was perhaps
> not clear enough with what I was trying to do. The following is the
> data for one of the patients:
> 
> pt_id         dov	          pt_ht	   age
> 2	       	24-May-01  141  	  12.92519	
> 2	       	31-May-01	 141	          12.94441	
> 2	       	11-Sep-01	 141	          13.22718	
> 2	       	11-Dec-01	 145	          13.47701	
> 2	      	21-Feb-02	 146	          13.67467	
> 2	      	2-May-02	 147	          13.86685	
> 2	     	11-Jul-02	 149	          14.05903	
> 2	     	21-Nov-02	 152	          14.42416
> 2	     	21-Jan-03	 152.3	  14.59163	
> 2	     	10-Apr-03	 153.7	 14.80851	
> 2	     	1-May-03	 153.4	 14.86616	
> 2	     	1-Jul-03	 153.8	 15.03363	
> 2	     	9-Sep-03	 154.8	 15.22581	
> 2	     	18-Nov-03	 154.8	 15.41798	
> 2	     	20-Jan-04	 156	         15.59094	
> 2	     	18-Mar-04	 157	         15.75017
> 2	     	20-May-04	 156	         15.92313	
> 2	     	15-Jul-04	 157	         16.07687	
> 2	     	16-Sep-04	 157	          16.24983	
> 2	     	11-Nov-04	 158	          16.40357	
> 
> I'm trying to find out how much this patient grew in a year. So the
> way I thought of it is to find out the interval of time that
> approximates an year and then compute the difference in height between
> those two points. So, taking the difference between _n and _n-1 will
> not suffice but it has to be _n and _n-i when i goes from 1 to _N
> within patient.
>  For example the first time point is 24 May 01 and the time that
> approximates a year is 2 may 02. So, the height increase would be
> (147-141).
> Then the next one would be from 2 May 02 to 1 May 03 and so on.
> The patient number 3 has 105 visits from 1988 to 2000, so the number
> of intervals would be more compared to ID 2. I would most probably end
> up using the most recent one year height increase for analysis
> purposes, but that doesn't make this any easier.
> 
> I hope this clarifies things better.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index