Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: Fwd: Minimum in 24 hours


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: RE: Fwd: Minimum in 24 hours
Date   Fri, 17 Feb 2012 15:45:45 +0000

It's also true that -egen- under -by:- interprets subscript references w.r.t. the entire dataset, and not with reference to groups defined by -by:-. This could be interpreted as a bug or misfeature, but the help does warn you against using explicit subscripting. 

Nick 
[email protected] 


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 17 February 2012 15:24
To: '[email protected]'
Subject: st: RE: Fwd: Minimum in 24 hours

Setting aside other details, the Stata function -min(varname)- used in -gen newvar = min(varname)- might be expected just to return values of -varname- as you fed it one argument only. As it happens -min(arg)- is illegal in Stata as two or more arguments are required. 

Note that the effect of also specifying any -if- qualifier would to restrict evaluation to observations selected, and not at all to extend comparisons to include other observations, which was your intention. Values for observations not selected by the -if- would just be returned as missing. But, as said, using -min()- with one argument is illegal. 

The -egen- function -min()- despite having the same name works quite differently. -egen newname = min(varname)- would put the minimum of -varname- over all observations to a new value (constant) for all observations; conversely you can restrict comparisons to subsets. 

Using subscripts with -egen- is dangerous as -egen- can (temporarily) change the -sort- order, and thus subvert your intentions. 

What you want can be done in various ways. Here is one. I separate out calculation of the first time for each identifier. 

local oneday = 60000 * 60 * 24 
bysort id (time) : gen first = time[1] 
egen minMV = min(MV / (inrange(time, first, first + `oneday')) , by(id) 

The trickery used here is easier than may appear at first sight. 

1. It is crucial that -egen-'s -min()- function will accept expressions and is not restricted to working with a varname as argument. 

2. The expression MV / (inrange(time, first, first + `oneday') is evaluated as follows. 

inrange(time, first, first + `oneday') will evaluate as 1 if true and 0 if not true. 

Then MV / (O or 1) evaluates as missing if the denominator is 0 and MV otherwise. 

The effect is to return missings for observations you want ignored, those after the first day, and -min()- does ignore those missings in calculating the minimum. 

There is much more discussion in

SJ-11-2 dm0055  . . . . . . . . . . . . . .  Speaking Stata: Compared with ...
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/11   SJ 11(2):305--314                                (no commands)
        reviews techniques for relating values to values in other
        observations

Nick 
[email protected] 

Paul McCabe

I have data relating to admissions to intensive care units with data a
little like this


     |      id                 time                 MV |
     |-------------------------------------|
  1. | 1128809   26oct2005 04:30:00     11 |
  2. | 1128809   26oct2005 05:30:00      7 |
  3. | 1128809   26oct2005 06:30:00    8.3 |
  4. | 1128809   26oct2005 07:30:00    8.4 |
  5. | 1128809   26oct2005 08:30:00    8.4 |
     |-------------------------------------|
  6. | 1128809   26oct2005 09:30:00    8.1 |
  7. | 1128809   26oct2005 10:30:00    8.8 |
  8. | 1128809   26oct2005 11:30:00    7.8 |
  9. | 1128809   26oct2005 17:30:00    7.6 |
 10. | 1128809   26oct2005 18:30:00      8 |
     |-------------------------------------|
 11. | 1128809   26oct2005 19:30:00      8 |
 12. | 1128809   26oct2005 20:30:00    8.1 |
 13. | 1128809   26oct2005 21:30:00    9.2 |
 14. | 1128809   26oct2005 22:30:00    8.9 |
 15. | 1128809   26oct2005 23:30:00   10.4 |
     |-------------------------------------|
 16. | 1128809   27oct2005 00:30:00    9.6 |
 17. | 1128809   27oct2005 01:30:00    7.8 |
 18. | 1128809   27oct2005 02:30:00    7.8 |
 19. | 1128809   27oct2005 03:30:00    7.7 |

I want the minimum value within the first 24 hours of admission.
However the function min() returns:

gen minMV=min(MV) if time>=time[1] & time<(time[1]+8.64e7) & id==1128809
invalid syntax
r(198);
And while the egen function works with explicit ids, does not allow
explicit subscripting when I try to combine with by:

sort id time
by id:egen minMV=min(MV) if time>=time[1] & time<(time[1]+8.64e7)
8469 missing values generated)
duplicates examples id minMV
Duplicates in terms of id minMV
  +--------------------------------------------+
  | group:      #   e.g. obs        id   minMV |
  |--------------------------------------------|
  |      1     19          1   1128809       7 |
  |      2    593         20   1128809       . |
  |      3     22        633   1128817      10 |
  |      4    437        613   1128817       . |
  |      5     18       1072   1128827       . |
  |--------------------------------------------|
  |      6    474       1090   1128828       . |
  |      7     73       1564   1275019       . |
  |      8    118       1637   1613911       . |
  |      9    282       1755   1621603       . |
  |     10    246       2037   1630155       . |
  |--------------------------------------------|
  |     11    465       2283   1633930       . |
  |     12    356       2748   1660876       . |
  |     13     64       3104   1667289       . |
  |     14     52       3168   1687505       . |
  |     15    754       3220   2109887       . |
  |--------------------------------------------|
  |     16     46       3974   2124610       . |
  |     17     47       4020   2129680       . |
  |     18     56       4067   2141764       . |
  |     19    149       4123   2147829       . |
  |     20    402       4272   2150550       . |
  |--------------------------------------------|
  |     21    676       4674   2189948       . |
  |     22    304       5350   2201604       . |
  |     23    955       5654   2202648       . |
  |     24     32       6609   2208416       . |
  |     25    641       6641   2212494       . |
  |--------------------------------------------|
  |     26   1229       7282   2217835       . |
  +--------------------------------------------+
(ID 1128817 was admitted at the same time as ID1128809 which is why
minMV is created for them)

Can anyone give advice on how to combine a function, by, and the time selection?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index