[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: SV: RE: RE: Data transformation
"Nick Cox" <firstname.lastname@example.org>
st: RE: SV: RE: RE: Data transformation
Mon, 12 Nov 2007 18:00:37 -0000
If I understand correctly, you are saying that you have some multiple-vehicle (> 2) accidents. For anything in that territory, you might
find this FAQ of use or interest:
How do I create variables summarizing for each individual properties of the other members of a group?
Whenever you have missing values in one observation, they
will get carried across in the way you would want. In the
simplest kind of case,
obs 1: 2000
obs 2: .
. gen otherfoo = foo[3 - _n]
Consider obs 1. The other value of -foo- is -foo[3 - 1]- or
-foo- which is missing. All the code being used
is out in the open.
In any case, you can try this out for yourself. You
need not depend on an answer from the list.
This is all in contrast to -egen, total()- which ignores
missings. Often that is a feature, but in your case
the direct approach has benefits. The condition -if _N == 2-
makes explicit that the trick is for two vehicle accidents.
bys Accidentnr : gen othersweight = weight[3 - _n] if _N == 2
bys Accidentnr : gen otherscost = cost[3 - _n] if _N == 2
For your curiosity I have arranged the data in a way that this subset only consists of two-vehicle accidents.
Another question for my curiosity: Will your way of doing things take care of the problem of missing values that I face using Keiths code. Due to missing values I had to add a command after generating the "other"-variables, namely:
replace othervar=. if othervar==var
This only works when I have variables that never take the value zero, so not for dummies without further transformations. (Which I have done) The above command also makes the "other"-variable missing for cases where we actually have data on the "other"-variable but not on the "own"-variable, not a problem for me since I drop these observations anyway in the regression but it whould be nice to know how to solve this problem just for my curiosity.
> -----Ursprungligt meddelande-----
> Från: email@example.com
> [mailto:firstname.lastname@example.org] För Nick Cox
> Skickat: den 12 november 2007 13:29
> Till: email@example.com
> Ämne: st: RE: RE: Data transformation
> I jumped to an incorrect reading of your problem through not
> reading it carefully enough.
> Point of curiosity: What about accidents involving three or
> more vehicles? Just not in the data?
> Point of technique: Keith's code was this
> bys Accidentnr: egen othersweight=sum(weight) bys Accidentnr:
> egen otherscost=sum(cost) replace
> othersweight=othersweight-weight replace
> otherscost=otherscost-cost rename weight ownweight rename cost owncost
> namely: other value = sum of two values - this value
> Given that there are two, and only two, cars for each
> accident, you can get there in this way too:
> bys Accidentnr : gen othersweight = weight[3 - _n] bys
> Accidentnr : gen otherscost = cost[3 - _n]
> The trick is simply a flip or reflection, exploiting the fact
> under -by:- the subscript _n is determined within groups.
> Thus if _n is 1, 3 - _n is 2, and vice versa.
> Actually, if there is just one car in any accident, a call to
> observation 3 - _n will yield missing values, which is
> appropriate too.
> Lina Jonsson
> I have a dataset concerning accidents involving two vehicles
> that I have in two formats, wide and long like this:
> Accidentnr vehiclenr weight cost
> 1 0 1000 35000
> 1 1 1500 150000
> 2 0 1200 150000
> 2 1 1700 750000
> Accidentnr weight0 weight1 cost0 cost1
> 1 1000 1500 35000 150000
> 2 1200 1700 150000 750000
> Now I whould like to transform the data to a long format but
> with information also concering the other vehicle involved in
> each accident like this:
> Accidentnr vehiclenr ownweight othersweight owncost
> 1 0 1000 1500
> 35000 150000
> 1 1 1500 1000
> 150000 35000
> 2 0 1200 1700
> 150000 750000
> 2 1 1700 1200
> 750000 150000
* For searches and help try: