Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: using encode to order string distances |
Date | Tue, 25 Oct 2011 10:05:37 +0100 |
If it's brevity one is after the brute force solution label def mylab 1 "0-9" 2 "10-19" 3 "20-29" 4 "30-39" 5 "40-49" 6 "50-59" 7 "60-69" 8 "70-79" 9 "80-89" 10 "90-99" 11 "100+" encode nearestcentre, gen(nearestcentre2) label(mylab) beats any of these solutions. Nick On Tue, Oct 25, 2011 at 9:53 AM, Tim Evans <Tim.Evans@wmciu.nhs.uk> wrote: > Thanks Nick, > > I'll give that a try. > > As I only had "100+" to deal with I used the following: > > encode nearestcentre if nearestcentre!="100+", generate( nearestcentre2) > replace nearestcentre2=12 if nearestcentre2==. > label define nearestcentre2 12 "100+", add > > I know its not particularly pretty, but it did work when I needed it, but will take a look at what you suggest. From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > An alternative is to start by using regex (regular expression) functions. > > -moss- (SSC) is a convenience tool wrapped around Stata's regex > technology. Its use here is perhaps overkill: > > . l > > +-------+ > | var1 | > |-------| > 1. | 0-9 | > 2. | 10-19 | > 3. | 20-29 | > 4. | 30+ | > +-------+ > > . moss var1, match("([0-9]+)") regex > > . l > > +----------------------------------------------------+ > | var1 _count _match1 _pos1 _match2 _pos2 | > |----------------------------------------------------| > 1. | 0-9 2 0 1 9 3 | > 2. | 10-19 2 10 1 19 4 | > 3. | 20-29 2 20 1 29 4 | > 4. | 30+ 1 30 1 . | > +----------------------------------------------------+ > > Nick > > On Mon, Oct 24, 2011 at 6:28 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: >> You need to parse your strings and order on the numeric equivalent of the lower value. That will be sufficient because the upper values increase too, except that the open upper limit is implicit. >> >> Here is one way to do it. >> >> Suppose -sdist- is string variable with distances. >> >> gen pos = max(strpos(sdist, "-"), strpos(sdist, "+")) >> >> We look for "-" or "+". >> >> gen n1 = real(substr(sdist, 1, pos-1)) >> egen group = group(n1) >> labmask group, values(sdist) >> >> where -search labmask- will point to download locations. >> >> Nick >> n.j.cox@durham.ac.uk >> >> Tim Evans >> >> I have distances stored in strings like: >> >> 0-9 >> 10-19 >> 20-29 >> 30-39 >> >> >> up to >> 90-99 >> 100+ >> >> I want to encode these to keep the current values as labels and replace the numbers behind them as 1-12. >> >> When I encode and generate a new variable I end up with 100+ as the 3rd value rather than the 12th value. >> >> How can I force encode to make sure 100+ is the 12th and not the 3rd value. >> >> I'm using >> >> encode nearestcentre, generate(NRCENT) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/