Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: finding the last date among multiple date variables

From   "Nick Cox" <>
To   <>
Subject   st: RE: finding the last date among multiple date variables
Date   Fri, 14 Jun 2002 09:31:53 +0100

Jonathan Kaplan
> Sorry for this novice question ,but been searching by findit, ssc and 
> statalist without luck(using Stata7SE):
> I have data on HIV-related Non Hodgkin lymphoma patients with 
> multiple date 
> variables: I would like to generate a new datevariable which 
> keeps the date 
> of the time-variable, which has the last date recorded for each patient.
> There are several missing date values within each variable.
> e.g:the timevariables which could be the last followup are deaddat, 
> opddat,hospdat,hivrnadat,statlymdat,lostdat

I assume that your dates are held as Stata dates, 
not e.g. as strings. 

The fact that you are dealing with dates doesn't, 
for once, complicate this question. The last date is 
simply the maximum date. You can rely on Stata's maximum 
functions to do the smart thing about missings: 
even though in Stata numeric missing is treated as higher 
than any other numeric value, the maximum is reported 
as missing if and only if all values are missing. 

To get a row-wise maximum, for each observation across
variables, use 

egen lastdate = rmax(<date_variables>)


gen lastdate = max(<date_variables separated by commas>) 

To get a maximum across groups of observations, use

by <identifier>: egen lastdate = max(<date_variable>)

However, you will have to -format- this new last date 
variable yourself. 
It doesn't inherit the format of the variable(s) from 
which it is calculated. 

The first date is equally easy -- in fact easier, as
there is, in addition to the -egen- way with -min()- or -rmin()-, 
the purist way from first principles, e.g. 

bysort id (date) : gen firstdate = date[1] 

Note that the equivalent 

bysort id (date) : gen lastdate = date[_N] 

will be not what you want because the missings
will end up as the last date for each id 
whenever they occur. That can be fixed, but most 
users find the -egen- way more congenial, I would 

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index