Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Changing spells to person-years


From   David Kantor <kantor.d@att.net>
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: Changing spells to person-years
Date   Wed, 22 Nov 2006 13:03:33 -0500

I would agree with Nick that a reshape is needed. But that might be only part of the solution.

What Nick presented is for one attribute. But there are two separate attributes in Justin's sample data: region and rural/urban (what you might call urbanicity, which I will refer to as urb). And there may be more attributes in the real data.

Each attribute seems to have its own separate start and end dates. So I would treat them separately. (Maybe there is a way to do this all at once, but I don't see it yet.) So I would suppose that the variables are...

id region1 region_start1 region_end1 region2 region_start2 region_end2 region3 region_start3 region_end3
urb1 urb_start1 urb_end1 urb2 urb_start2 urb_end2 urb3 urb_start3 urb_end3 urb4 urb_start4 urb_end4 urb5 urb_start5 urb_end5
-- and so on for other possible attributes.

So first, for region:
keep id region*
reshape long region region_start region_end, i(id)
drop if missing(region) & missing(region_start) & missing(region_end)
ren region_start start
ren region_end end

-- and more will be done with this, but I will describe that later.

Separately, for urb:
keep id urb*
reshape long urb urb_start urb_end, i(id)
drop if missing(urb) & missing(urb_start) & missing(urb_end)
ren urb_start start
ren urb_end end

-- again, more will be done with this.

You do this for each of the attributes.

But the other part of this task is to convert to a person-year basis. Continuing where we left the region portion of the data, we have variables
id region start end
then...
gen byte numyears = end-start +1
drop end
expand numyears
by id: gen byte year = start + _n -1
drop start

/* or maybe use the name "age" instead of "year". */

Do that on each attribute, and save them as tempfiles.
Finally, merge all these attribute files together, with id and year as the match variables.

----

P.S., maybe you can do the reshape all in one step -- say, using the variable names in Nick's reply (where1 start1 end1 where2 ... where10 start1 end10). But after the reshape, you will need to separate attribute types (of -where-) by the content. Values such as "accra", "central", "ashanti" are for region; values such as "rural" and "urban" are for urbanicity. So you would be faced with an issue of separating the data into anyway, and doing it by content is not the best route to take. (You need to know all the values you can expect -- and the values that correspond to various attributes must be non-overlapping.) So, I think the plan I outlined above is better.

Another matter is that this plan puts the data into a person-year basis, which is what I believe Justin wants. There is another way to organize this: spells of time (for each person), where a spell begins whenever any one of the attributes changes. This is what I hinted at in my first reply to Justin's original question on 11-20-2006.

I hope this helps.
--David

At 07:39 AM 11/22/2006, Nick wrote:

I can't comment on STATA, but some comments on Stata may help.

From your example the variables could be called

id where1 start1 end1 where2 ... where10 start10 end10

so that

reshape long where start end, i(id)
drop if missing(where) & missing(start) & missing(end)

would then be what you want.

In short, this is a standard -reshape- problem, so that
detailed study of the help and manual entry for -reshape-
should give you enough background. The most likely
small problem may be unsuitable variable names, in which
case some prior use of -rename- is in order. Some
common problems are discussed at

FAQ     . . . . . . . . . . . . . . . . . . . . . . . .  Problems with reshape
        12/03   I am having problems with the reshape command. Can
                you give further guidance?
                http://www.stata.com/support/faqs/data/reshape3.html

Nick
n.j.cox@durham.ac.uk

Buszin, Justin

> Below are two rows of data, each representing an observation
> in a STATA dataset (NOT raw data). The first number is the
> variable person ID, the first word is a variable which
> represents where that person lived at age 0 (thus, where they
> were born). Person 1 was born in central region, person 2
> born in western region. The number after represents the start
> date of the spell, and then the next number is the end date
> of the spell. So person 1 lived in central region until she
> was 10 years old, person 2 lived in western region until she
> was 30 years old. The next word is where that person moved,
> if ever. Person 1 moved to Accra at age 11 and stayed there
> until she was 20, and then moved to Ashanti region when she
> was 21 and stayed there until she was 27, which is her
> current age at time of survey.
>
> Each of these columns represents a variable and follows a
> series (word, number, number). Since person 2 moved no more
> than twice, she will be missing in future variable series
> (thus why you see nothing for person 2 while person 1 shows
> her Ashanti move) until we come upon the next attribute
> (rural/urban). A person could have moved up to 10 times, so
> if she moved less than that number of times, she will be
> missing on subsequent variables and series until the next
> attribute. So person 1 is missing on the series for 4th move,
> 5th move, etc, until we come to the ten-series of rural/urban
> life. Person 1 lived in a rural area from 0-3, person 2 lived
> in an urban area from 0-30, a rural area from 31-32, and so forth.
>
> 1 central  0 10  accra  11 20 ashanti 21 27 rural  0 3
> urban 4 6    rural 7 15 urban 16 20 rural 21 27
> 2 western 0 30 central 31 32                     urban 0 30
> rural  31 32
>
> The words are string variables and the number numeric. In
> case there is a problem reading in e-mail, the line breaks
> are after person ID, the word, the start date, the end date,
> etc (each column represents a new variable). Again, if we get
> past the first variable, which is person ID, the sequence of
> variables is: place of residence, age started at that place,
> age ended at that place, 2nd place of residence, age started
> at that place, age ended at that place, 3rd place of residence, etc.
>
> The big question is: How can STATA take this spell file and
> change it into a person-year file?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index