Is there any way to make "collapse" turn such a file with multiple
observations into a panel.
If you have:
ID YEAR HOSPITALIZATION
1 1995 1
1 1997 1
1 1997 1
1 1998 1
2 1995 1
I.e. sometimes you have more than one observation, but sometimes you
have zero. More than one is what "collapse" does. But zero?
And you want a panel with years nested within individuals, so that:
ID YEAR HOSPITALIZATION
1 1995 1
1 1996 0
1 1997 2
1 1998 1
2 1995 1
2 1996 0
2 1997 0
2 1998 0
I could not find a way to make collapse do this. When I used collapse,
it omitted years with zero observations for an individual within that
year. I did not specify cw, which I believe would omit observations with
missing data, rather than missing observations.
Den 08-09-2010 09:52, Morten Hesse skrev:
that the file with multiple observations turn into a panel.
If you have:
ID YEAR HOSPITALIZATION
1 1995 1
1 1997 1
1 1997 1
1 1998 1
2 1995 1
I.e. sometimes you have more than one observation, but sometimes you
have zero. More than one is what "collapse" does. But zero?
And you want a panel with years nested within individuals, so that:
ID YEAR HOSPITALIZATION
1 1995 1
1 1996 0
1 1997 2
1 1998 1
2 1995 1
2 1996 0
2 1997 0
2 1998 0
I could not find a way to make collapse do this. When I used collapse,
it omitted years with zero observations for an individual within that
year. I did not specify cw, which I believe would omit observations
with missing data, rather than missing observations.
Best regards
Morten
Den 07-09-2010 22:36, Nick Cox skrev:
Excellent. He needs the dates as well, which is easy too.
Nick
n.j.cox@durham.ac.uk
David Bell
Or maybe simpler:
collapse (first) sbp dm glu, by(patient)
On Sep 7, 2010, at 4:07 PM, Nick Cox wrote:
SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move
step by: step
Q1/02 SJ 2(1):86--102 (no
commands)
explains the use of the by varlist : construct to tackle
a variety of problems with group structure, ranging from
simple calculations for each of several groups to more
advanced manipulations that use the built-in _n and _N
To copy lab results backward in time, try this
gen negvisitdate = - visitdate
bysort patient (negvisitdate) : replace sbp = sbp[_n-1] if missing(sbp)
bysort patient (negvisitdate) : replace dm = dm[_n-1] if missing(dm)
bysort patient (negvisitdate) : replace glu = glu[_n-1] if missing(glu)
Now
drop negvisitdate
bysort patient (visitdate) : keep if _n == 1
Alternatively, David Kantor has a utility -carryforward- at SSC. But
something like the code here should suffice if your example is
indicative.
Nick
n.j.cox@durham.ac.uk
Michael Eisenberg
I have a problem I hope you can help with that involves cleaning a
dataset for analysis.
I have a dataset with about 10K men. Several men had multiple clinic
visits, so that there are about 12K observations. There is also some
lab data that I'll need that was not obtained until follow up visits.
I would like to only analyze data from the earliest visit and the
first available lab data.
Can stata do this?
Turn this
patient visitdate sbp dm glu
1 1/1/09 140 . .
1 1/4/09 128 . 202
1 2/1/09 131 1 .
2 4/1/09 160 . 341
2 4/4/09 144 . 180
2 5/1/09 170 1 .
3 6/1/09 119 . .
3 6/4/09 107 . .
3 7/1/09 124 . 96
4 9/1/09 104 1 110
4 9/4/09 155 . .
4 10/1/09 . . .
Into this
patient visitdate sbp dm glu
1 1/1/09 140 1 202
2 4/1/09 160 1 341
3 6/1/09 119 . 96
3 9/1/09 104 1 110