Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Fwd: Fastest way to identify values that start and end with a 9? |

Date |
Thu, 3 Oct 2013 10:40:14 +0100 |

You seem to have multiple identities. You signed this "Paul", but your identifier is Evan DeFilippis. The Statalist FAQ, which you were asked to read before posting, explains that we request the use of full real names. This question also seems to overlap with a question posted by "Parseltongue" at http://stackoverflow.com/questions/19092766/stata-regex-search-and-replace-on-integer-variables That being so, our policy on cross-posting applies, also explained in the FAQ you were asked to read before posting: http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting Here is the relevant part: "People posting on Statalist may also think about posting the same question on other listservers or in web forums. There is absolutely no rule against doing that; it is not our business to constrain what you do elsewhere. But if you do post elsewhere, we ask that you provide cross-references in URL form to searchable archives. That way, people interested in your question can quickly check what has been said elsewhere and avoid posting similar comments. Being open about cross-posting saves everyone time." This question arises at least in part because you didn't explain what you are doing well enough on Stack Overflow for anyone to provide a complete answer. Anyone inclined to try to answer this would do well to look at the SO thread cited above. All that said, I have some comments on your code. quietly tostring _all, replace ds, has(type string) If you convert _all_ variables to string, then there is precisely no need to fire up -ds- to find out _which_ variables are string. As said, they all are. So, from that point of view, your code could be shortened to quietly tostring _all, replace quietly foreach j of var * { replace `j'= regexr(`j', "^[9]*[9]$","DK") replace `j' = regexr(`j', "^[9]*[8]$", "REF") } But your main question is whether you need to convert all your variables to string, and the answer is, at most, only those variables that might contain these patterns. Also, as already indicated on SO, if such variables are numeric, you don't _need_ to convert them to string at all. It might be sufficient to check the first digit and the last digit. Otherwise I don't think you've explained your data fully enough to allow a detailed answer. I remain fuzzy whether these 9...8 or 9...9 patterrns are within numeric variables or string variables holding numeric characters. Nick njcoxstata@gmail.com On 3 October 2013 10:08, Evan DeFilippis <defilippis@gmail.com> wrote: > Values in my data set contain different numerical representations for > "Don't Know" and "Refusal" > > A "Don't Know" will always start and end with a '9', but there can be > as many '9's in between as possible, up to the maximum length of a > string (244). > > A "Refusal" will always start with a '9' and end with an '8', and > there can be as many '9's' in between as possible, up to the maximum > length of a string (244). > > The data set contains strings, integers, bytes, etc.. > > I want to be able to convert the numerical representations of 'Don't > Know' and 'Refusal's' into DK and REF, respectively. > > My current strategy for doing this looks like so: > > quietly tostring _all, replace > ds, has(type string) > di "`r(varlist)'" > unab string_vars : `r(varlist)' > foreach j in `string_vars' { > quietly replace `j'= regexr(`j', "^[9]*[9]$","DK") > quietly replace `j' = regexr(`j', "^[9]*[8]$", "REF") > } > > However, this is slow because it converts the entire data set into > strings, which takes about 5 minutes, and then it has to do has(type > string) in order to get r(varlist) to iterate over all those strings > which takes about 4 minutes. > > Is there a faster way to do this that perhaps does not involve > converting everything to strings? > > Paul * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Fwd: Fastest way to identify values that start and end with a 9?***From:*Evan DeFilippis <defilippis@gmail.com>

- Prev by Date:
**st: Battling Mata docx commands - automation?** - Next by Date:
**Re: st: Looping over datetimes for simultaneous variable generation** - Previous by thread:
**st: Fwd: Fastest way to identify values that start and end with a 9?** - Next by thread:
**Re: st: Fwd: Fastest way to identify values that start and end with a 9?** - Index(es):