[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: finding the last date among multiple date variables
> Sorry for this novice question ,but been searching by findit, ssc and
> statalist without luck(using Stata7SE):
> I have data on HIV-related Non Hodgkin lymphoma patients with
> multiple date
> variables: I would like to generate a new datevariable which
> keeps the date
> of the time-variable, which has the last date recorded for each patient.
> There are several missing date values within each variable.
> e.g:the timevariables which could be the last followup are deaddat,
I assume that your dates are held as Stata dates,
not e.g. as strings.
The fact that you are dealing with dates doesn't,
for once, complicate this question. The last date is
simply the maximum date. You can rely on Stata's maximum
functions to do the smart thing about missings:
even though in Stata numeric missing is treated as higher
than any other numeric value, the maximum is reported
as missing if and only if all values are missing.
To get a row-wise maximum, for each observation across
egen lastdate = rmax(<date_variables>)
gen lastdate = max(<date_variables separated by commas>)
To get a maximum across groups of observations, use
by <identifier>: egen lastdate = max(<date_variable>)
However, you will have to -format- this new last date
It doesn't inherit the format of the variable(s) from
which it is calculated.
The first date is equally easy -- in fact easier, as
there is, in addition to the -egen- way with -min()- or -rmin()-,
the purist way from first principles, e.g.
bysort id (date) : gen firstdate = date
Note that the equivalent
bysort id (date) : gen lastdate = date[_N]
will be not what you want because the missings
will end up as the last date for each id
whenever they occur. That can be fixed, but most
users find the -egen- way more congenial, I would
* For searches and help try: