Re: st: data manipulation

From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: data manipulation
Date   Wed, 26 Jan 2005 10:39:33 -0500

At 03:21 PM 1/26/2005 +0000, Simon Moore wrote:
I am to work with some data that has been inputted in SPSS in a rather unusual format. Transferring from SPSS into Stata is not the problem - rearranging the data into a format I am comfortable with is. At the moment it looks something like this:

id date time person_id Gender Injury

1 21/1/2005 23:10 1 M Face
. . . 2 M Head
. . . 3 F Legs
2 23/1/2005 04:15 1 M Arms
. . . 2 F Feet
3 23/1/2006 05:10 1 F Face

The data refers to violent incidents in a particular area. For each incident more than one person (the maximum is somewhere around 6 but could go higher as new data arrives) may have been involved each sustaining different injuries.

I would like to rearrange these data into a form something like this:

id date time gender1 injury 1 gender2 injury2 ...

As far as I can see -collapse- will not help me much. So, has anyone had experience with this type of problem and could you point me in the right direction?

Many thanks
First, you need to "cascade" the date and time to fill in where missing. Assuming that the given order is correct in that the leading record in a each incident has that information and the following records have missing.

assert mi(date) == mi(time)
replace date = date[_n-1] if mi(date)
replace time = time[_n-1] if mi(time)

Then, if you want an id, you need to create one. The incidents seem to be identified by date & time, so those (together) can serve as the id, or you can create one:

gen long id = sum(date~=date[_n-1] & time~=time[_n-1])

Finally you want to reshape wide. See help reshape.

I hope this helps.
-- David

David Kantor
Institute for Policy Studies
Johns Hopkins University
[email protected]
