[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Changing spells to person-years
I would agree with Nick that a reshape is needed. But that might be
only part of the solution.
What Nick presented is for one attribute. But there are two separate
attributes in Justin's sample data: region and rural/urban (what you
might call urbanicity, which I will refer to as urb). And there may
be more attributes in the real data.
Each attribute seems to have its own separate start and end
dates. So I would treat them separately. (Maybe there is a way to
do this all at once, but I don't see it yet.) So I would suppose
that the variables are...
id region1 region_start1 region_end1 region2 region_start2
region_end2 region3 region_start3 region_end3
urb1 urb_start1 urb_end1 urb2 urb_start2 urb_end2 urb3 urb_start3
urb_end3 urb4 urb_start4 urb_end4 urb5 urb_start5 urb_end5
-- and so on for other possible attributes.
So first, for region:
keep id region*
reshape long region region_start region_end, i(id)
drop if missing(region) & missing(region_start) & missing(region_end)
ren region_start start
ren region_end end
-- and more will be done with this, but I will describe that later.
Separately, for urb:
keep id urb*
reshape long urb urb_start urb_end, i(id)
drop if missing(urb) & missing(urb_start) & missing(urb_end)
ren urb_start start
ren urb_end end
-- again, more will be done with this.
You do this for each of the attributes.
But the other part of this task is to convert to a person-year
basis. Continuing where we left the region portion of the data, we
id region start end
gen byte numyears = end-start +1
by id: gen byte year = start + _n -1
/* or maybe use the name "age" instead of "year". */
Do that on each attribute, and save them as tempfiles.
Finally, merge all these attribute files together, with id and year
as the match variables.
P.S., maybe you can do the reshape all in one step -- say, using the
variable names in Nick's reply (where1 start1 end1 where2 ... where10
start1 end10). But after the reshape, you will need to separate
attribute types (of -where-) by the content. Values such as "accra",
"central", "ashanti" are for region; values such as "rural" and
"urban" are for urbanicity. So you would be faced with an issue of
separating the data into anyway, and doing it by content is not the
best route to take. (You need to know all the values you can expect
-- and the values that correspond to various attributes must be
non-overlapping.) So, I think the plan I outlined above is better.
Another matter is that this plan puts the data into a person-year
basis, which is what I believe Justin wants. There is another way to
organize this: spells of time (for each person), where a spell begins
whenever any one of the attributes changes. This is what I hinted at
in my first reply to Justin's original question on 11-20-2006.
I hope this helps.
At 07:39 AM 11/22/2006, Nick wrote:
I can't comment on STATA, but some comments on Stata may help.
From your example the variables could be called
id where1 start1 end1 where2 ... where10 start10 end10
reshape long where start end, i(id)
drop if missing(where) & missing(start) & missing(end)
would then be what you want.
In short, this is a standard -reshape- problem, so that
detailed study of the help and manual entry for -reshape-
should give you enough background. The most likely
small problem may be unsuitable variable names, in which
case some prior use of -rename- is in order. Some
common problems are discussed at
FAQ . . . . . . . . . . . . . . . . . . . . . . . . Problems with reshape
12/03 I am having problems with the reshape command. Can
you give further guidance?
> Below are two rows of data, each representing an observation
> in a STATA dataset (NOT raw data). The first number is the
> variable person ID, the first word is a variable which
> represents where that person lived at age 0 (thus, where they
> were born). Person 1 was born in central region, person 2
> born in western region. The number after represents the start
> date of the spell, and then the next number is the end date
> of the spell. So person 1 lived in central region until she
> was 10 years old, person 2 lived in western region until she
> was 30 years old. The next word is where that person moved,
> if ever. Person 1 moved to Accra at age 11 and stayed there
> until she was 20, and then moved to Ashanti region when she
> was 21 and stayed there until she was 27, which is her
> current age at time of survey.
> Each of these columns represents a variable and follows a
> series (word, number, number). Since person 2 moved no more
> than twice, she will be missing in future variable series
> (thus why you see nothing for person 2 while person 1 shows
> her Ashanti move) until we come upon the next attribute
> (rural/urban). A person could have moved up to 10 times, so
> if she moved less than that number of times, she will be
> missing on subsequent variables and series until the next
> attribute. So person 1 is missing on the series for 4th move,
> 5th move, etc, until we come to the ten-series of rural/urban
> life. Person 1 lived in a rural area from 0-3, person 2 lived
> in an urban area from 0-30, a rural area from 31-32, and so forth.
> 1 central 0 10 accra 11 20 ashanti 21 27 rural 0 3
> urban 4 6 rural 7 15 urban 16 20 rural 21 27
> 2 western 0 30 central 31 32 urban 0 30
> rural 31 32
> The words are string variables and the number numeric. In
> case there is a problem reading in e-mail, the line breaks
> are after person ID, the word, the start date, the end date,
> etc (each column represents a new variable). Again, if we get
> past the first variable, which is person ID, the sequence of
> variables is: place of residence, age started at that place,
> age ended at that place, 2nd place of residence, age started
> at that place, age ended at that place, 3rd place of residence, etc.
> The big question is: How can STATA take this spell file and
> change it into a person-year file?
* For searches and help try: