Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Paul McCabe <mcpstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: RE: Fwd: Minimum in 24 hours |
Date | Fri, 17 Feb 2012 15:48:46 +0000 |
Nick, That's perfect - thank you Best wishes Paul On 17 February 2012 15:45, Nick Cox <n.j.cox@durham.ac.uk> wrote: > It's also true that -egen- under -by:- interprets subscript references w.r.t. the entire dataset, and not with reference to groups defined by -by:-. This could be interpreted as a bug or misfeature, but the help does warn you against using explicit subscripting. > > Nick > n.j.cox@durham.ac.uk > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: 17 February 2012 15:24 > To: 'statalist@hsphsun2.harvard.edu' > Subject: st: RE: Fwd: Minimum in 24 hours > > Setting aside other details, the Stata function -min(varname)- used in -gen newvar = min(varname)- might be expected just to return values of -varname- as you fed it one argument only. As it happens -min(arg)- is illegal in Stata as two or more arguments are required. > > Note that the effect of also specifying any -if- qualifier would to restrict evaluation to observations selected, and not at all to extend comparisons to include other observations, which was your intention. Values for observations not selected by the -if- would just be returned as missing. But, as said, using -min()- with one argument is illegal. > > The -egen- function -min()- despite having the same name works quite differently. -egen newname = min(varname)- would put the minimum of -varname- over all observations to a new value (constant) for all observations; conversely you can restrict comparisons to subsets. > > Using subscripts with -egen- is dangerous as -egen- can (temporarily) change the -sort- order, and thus subvert your intentions. > > What you want can be done in various ways. Here is one. I separate out calculation of the first time for each identifier. > > local oneday = 60000 * 60 * 24 > bysort id (time) : gen first = time[1] > egen minMV = min(MV / (inrange(time, first, first + `oneday')) , by(id) > > The trickery used here is easier than may appear at first sight. > > 1. It is crucial that -egen-'s -min()- function will accept expressions and is not restricted to working with a varname as argument. > > 2. The expression MV / (inrange(time, first, first + `oneday') is evaluated as follows. > > inrange(time, first, first + `oneday') will evaluate as 1 if true and 0 if not true. > > Then MV / (O or 1) evaluates as missing if the denominator is 0 and MV otherwise. > > The effect is to return missings for observations you want ignored, those after the first day, and -min()- does ignore those missings in calculating the minimum. > > There is much more discussion in > > SJ-11-2 dm0055 . . . . . . . . . . . . . . Speaking Stata: Compared with ... > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > Q2/11 SJ 11(2):305--314 (no commands) > reviews techniques for relating values to values in other > observations > > Nick > n.j.cox@durham.ac.uk > > Paul McCabe > > I have data relating to admissions to intensive care units with data a > little like this > > > | id time MV | > |-------------------------------------| > 1. | 1128809 26oct2005 04:30:00 11 | > 2. | 1128809 26oct2005 05:30:00 7 | > 3. | 1128809 26oct2005 06:30:00 8.3 | > 4. | 1128809 26oct2005 07:30:00 8.4 | > 5. | 1128809 26oct2005 08:30:00 8.4 | > |-------------------------------------| > 6. | 1128809 26oct2005 09:30:00 8.1 | > 7. | 1128809 26oct2005 10:30:00 8.8 | > 8. | 1128809 26oct2005 11:30:00 7.8 | > 9. | 1128809 26oct2005 17:30:00 7.6 | > 10. | 1128809 26oct2005 18:30:00 8 | > |-------------------------------------| > 11. | 1128809 26oct2005 19:30:00 8 | > 12. | 1128809 26oct2005 20:30:00 8.1 | > 13. | 1128809 26oct2005 21:30:00 9.2 | > 14. | 1128809 26oct2005 22:30:00 8.9 | > 15. | 1128809 26oct2005 23:30:00 10.4 | > |-------------------------------------| > 16. | 1128809 27oct2005 00:30:00 9.6 | > 17. | 1128809 27oct2005 01:30:00 7.8 | > 18. | 1128809 27oct2005 02:30:00 7.8 | > 19. | 1128809 27oct2005 03:30:00 7.7 | > > I want the minimum value within the first 24 hours of admission. > However the function min() returns: > > gen minMV=min(MV) if time>=time[1] & time<(time[1]+8.64e7) & id==1128809 > invalid syntax > r(198); > And while the egen function works with explicit ids, does not allow > explicit subscripting when I try to combine with by: > > sort id time > by id:egen minMV=min(MV) if time>=time[1] & time<(time[1]+8.64e7) > 8469 missing values generated) > duplicates examples id minMV > Duplicates in terms of id minMV > +--------------------------------------------+ > | group: # e.g. obs id minMV | > |--------------------------------------------| > | 1 19 1 1128809 7 | > | 2 593 20 1128809 . | > | 3 22 633 1128817 10 | > | 4 437 613 1128817 . | > | 5 18 1072 1128827 . | > |--------------------------------------------| > | 6 474 1090 1128828 . | > | 7 73 1564 1275019 . | > | 8 118 1637 1613911 . | > | 9 282 1755 1621603 . | > | 10 246 2037 1630155 . | > |--------------------------------------------| > | 11 465 2283 1633930 . | > | 12 356 2748 1660876 . | > | 13 64 3104 1667289 . | > | 14 52 3168 1687505 . | > | 15 754 3220 2109887 . | > |--------------------------------------------| > | 16 46 3974 2124610 . | > | 17 47 4020 2129680 . | > | 18 56 4067 2141764 . | > | 19 149 4123 2147829 . | > | 20 402 4272 2150550 . | > |--------------------------------------------| > | 21 676 4674 2189948 . | > | 22 304 5350 2201604 . | > | 23 955 5654 2202648 . | > | 24 32 6609 2208416 . | > | 25 641 6641 2212494 . | > |--------------------------------------------| > | 26 1229 7282 2217835 . | > +--------------------------------------------+ > (ID 1128817 was admitted at the same time as ID1128809 which is why > minMV is created for them) > > Can anyone give advice on how to combine a function, by, and the time selection? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/