I submitted this question on Friday, haven't seen it come through and so
I think it was eaten by the plain-text filter, so I'm trying again. My
apologies if I'm inadvertently deluging email on you...
Hello listers,
I've done a fair share of analysis using Stata but not much data
management in the past (been using SAS), but I've run up across a
situation where I have a hunch that Stata will solve this problem much
easier.
I'm not asking for exact code or anything, just an idea of where I
should be looking (what commands or functions are most promising). I've
done the usual bunch of searches (statalist archives, google, -search-,
etc) but I have a feeling I just don't have the right name for what I'm
trying to do to be able to find the right resources.
So here goes...
I have a dataset of road segments across a state. In cases where two
segments are adjacent (end_mlpst of one = beg_mlpst of the next when
road_id is equal) and share characteristics, I would like to combine
them into one record. i.e.: (note, this is not the real data, just a
simplified sample, but it does contain both numbers and strings). T1,
t2, t3... = traits. C1, c2, c3... = counts.
Seg_id Road_id beg_mlpst end_mlpst t1 t2 t3... tn ct1 ct2
ct3 ...
1 145 1.45 2.43 8 Y 2 1.22 3 1 3
2 145 2.43 2.47 8 Y 2 1.22 2 0 1
3 145 5.33 6.00 8 Y 1 3 1 1 1
4 145 6.00 6.10 7 Y 1 3 0 1 1
9 145 6.10 6.73 7 Y 1 3 0 0 7
The end result would be something like:
Seg_id Road_id beg_mlpst end_mlpst t1 t2 t3... tn ct1 ct2
ct3 ...
1 145 1.45 2.47 8 Y 2 1.22 5 1 4
3 145 5.33 6.00 8 Y 1 3 1 1 1
4 145 6.00 6.73 7 Y 1 3 0 1 8
Bonus question:
Preferably, I'd like to add another condition such that the segments
will only be combined if segment_length (i.e. end_milepost -
beg_milepost) is less than some value (say 0.10 miles), though I assume
once I have everything else, adding this extra condition is (fairly)
trivial.
Moreover, the ids of 4 and 9 was not a typo above as the segment_ids are
missing various values along the way (due to segments being outside the
population of interest), though it can be assured that mileposts
increase w/ segment_id for a given road_id. The jump in mileposts is
also not a typo as not all road segments are captured in the set.
I imagine this will likely have to iterate a few times through (as
combined segments may well combine with the segment after it, e.g.
segment 425 is end-to-end with and similar to 426, 427, and 428).
That said, most of those details are likely unnecessary to be pointed in
the right direction, but now it cannot be said I've left out important
information.
Thank you for your help and have a good weekend.
Lee Biernbaum
Economic and Industry Analysis Division
US DOT, Volpe Center
(617) 494-2834
lee.biernbaum@volpe.dot.gov
Lee Biernbaum
Economic and Industry Analysis Division
US DOT, Volpe Center
(617) 494-2834
lee.biernbaum@volpe.dot.gov
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/