Mileposts! Last I heard, Burma (Myanmar) and the United
States were the last holdouts on going metric (although
admittedly the UK is lukewarm on the point).
You can define groups of segments that are identical
on a bunch of variables by
egen group = group(<varlist>)
Then you can identify extremes by
egen BEG_MLPST = min(beg_mlpst), by(group)
egen END_MLPST = max(end_mlpst), by(group)
The range is thus
gen MLPST = END_MLPST - BEG_MLPST
and you can use -duplicates- to drop duplicates,
conditionally on values of MLPST.
P.S. You can always check for yourself what reached
the list by noting that
1. People on the list receive their own postings.
2. Posts are archived, so you can look at the archives.
Nick
n.j.cox@durham.ac.uk
Biernbaum, Lee
> I submitted this question on Friday, haven't seen it come
> through and so
> I think it was eaten by the plain-text filter, so I'm trying again. My
> apologies if I'm inadvertently deluging email on you...
>
>
> Hello listers,
>
> I've done a fair share of analysis using Stata but not much data
> management in the past (been using SAS), but I've run up across a
> situation where I have a hunch that Stata will solve this problem much
> easier.
>
> I'm not asking for exact code or anything, just an idea of where I
> should be looking (what commands or functions are most
> promising). I've
> done the usual bunch of searches (statalist archives, google,
> -search-,
> etc) but I have a feeling I just don't have the right name
> for what I'm
> trying to do to be able to find the right resources.
>
> So here goes...
>
> I have a dataset of road segments across a state. In cases where two
> segments are adjacent (end_mlpst of one = beg_mlpst of the next when
> road_id is equal) and share characteristics, I would like to combine
> them into one record. i.e.: (note, this is not the real data, just a
> simplified sample, but it does contain both numbers and strings). T1,
> t2, t3... = traits. C1, c2, c3... = counts.
>
> Seg_id Road_id beg_mlpst end_mlpst t1 t2 t3... tn ct1 ct2
> ct3 ...
> 1 145 1.45 2.43 8 Y 2 1.22 3 1 3
> 2 145 2.43 2.47 8 Y 2 1.22 2 0 1
> 3 145 5.33 6.00 8 Y 1 3 1 1 1
> 4 145 6.00 6.10 7 Y 1 3 0 1 1
> 9 145 6.10 6.73 7 Y 1 3 0 0 7
>
> The end result would be something like:
> Seg_id Road_id beg_mlpst end_mlpst t1 t2 t3... tn ct1 ct2
> ct3 ...
> 1 145 1.45 2.47 8 Y 2 1.22 5 1 4
> 3 145 5.33 6.00 8 Y 1 3 1 1 1
> 4 145 6.00 6.73 7 Y 1 3 0 1 8
>
> Bonus question:
> Preferably, I'd like to add another condition such that the segments
> will only be combined if segment_length (i.e. end_milepost -
> beg_milepost) is less than some value (say 0.10 miles),
> though I assume
> once I have everything else, adding this extra condition is (fairly)
> trivial.
>
> Moreover, the ids of 4 and 9 was not a typo above as the
> segment_ids are
> missing various values along the way (due to segments being
> outside the
> population of interest), though it can be assured that mileposts
> increase w/ segment_id for a given road_id. The jump in mileposts is
> also not a typo as not all road segments are captured in the set.
>
> I imagine this will likely have to iterate a few times through (as
> combined segments may well combine with the segment after it, e.g.
> segment 425 is end-to-end with and similar to 426, 427, and 428).
>
> That said, most of those details are likely unnecessary to be
> pointed in
> the right direction, but now it cannot be said I've left out important
> information.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/