Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Impavido, Gregorio" <GImpavido@imf.org> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: destring ignores more than what specified in ignore() |

Date |
Mon, 21 Nov 2011 10:27:39 -0500 |

Thank you Nick. It indeed wasn't clear to me that destring works with characters and not substrings (I should have looked at the ado file first...). It is now clear that destring creates local macros of each individual character specified in ignore() (lines 51-59 of destring.ado) and replaces them with "" in lines 229-230 before applying real(). This means (if understood correctly) that your last suggestion: destring <varlist>, replace ignore("nas") does not work as by starting with "n.a." or "n.s.", I am still left with ".." after the substitution. However, by adding | `temp'==".." in line 238 of destring, then you suggestion works like a charm. This is (I believe) equivalent to using the force option as you also suggest. All your other suggestions work perfectly. So thank you again. Gregorio -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Monday, November 21, 2011 5:36 AM To: 'statalist@hsphsun2.harvard.edu' Subject: RE: st: destring ignores more than what specified in ignore() On the information here destring <varlist>, replace ignore("nas") or destring <varlist>, replace force should work. Note that you don't need to set up your own loop or a prior filter of numeric variables; -destring- will do both for you. Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: 21 November 2011 08:22 To: statalist@hsphsun2.harvard.edu Subject: Re: st: destring ignores more than what specified in ignore() -destring- ignores characters, not substrings. The problem is at most that this is not clear to you when you read the help. -destring- did what you told it to do, which was, among other things, to remove ".". You need to fix your "n.a." and "n.s." first, e.g. within a loop replace `var' = subinstr("`var'", "n.a.", ".", .) replace `var' = subinstr("`var'", "n.s.", ".", .) or as you did it. -destring- is just a wrapper for -real()-, so -real()- is not really an alternative except in so far as -destring- is not understood. Your code is shorter and more efficient than -destring- as it can be tailored to your problem. In fact your last code segment can be shortened as -real("n.a.")- for example results in numeric missing. Nick On Mon, Nov 21, 2011 at 1:51 AM, Impavido, Gregorio <GImpavido@imf.org> wrote: > I looked at the many FAQ on destring but could not find an answer for my problem. Hence, the post and hopefully, it is not a duplicate. > > I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.". > > These variables should be numeric and I would like to destring them by coding: > > foreach var of varlist * { > capture confirm numeric variable `var' > if _rc { > destring `var', replace ignore("n.a." "n.s.") > } > } > > This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.". > > Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######. Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final "."). > > > First question (of two): Why is destring ignoring more things than what specified in the option ignore()? > > I found two ways around this odd behaviour of destring. > > The first option uses an extra line of code and it is: > > foreach var of varlist * { > capture confirm numeric variable `var' > if _rc { > replace `var' = "na" if inlist(`var', "n.a.", "n.s.") // this gets rid of the "." > destring `var', replace ignore("na") // no "." here!!! > } > } > > This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore). > > The second option uses generate with the real() function but also more lines of code as real() does not work with replace. > > foreach var of varlist * { > capture confirm numeric variable `var' > if _rc { > replace `var' = "." if inlist(`var', "n.a.", "n.s.") > local lbl : variable label `var' > gen `var'r = real(`var') > label var `var'r `"`lbl'"' > order `var'r, after(`var') > drop `var' > } > } > > Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)? > > Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)? > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: destring ignores more than what specified in ignore()***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: destring ignores more than what specified in ignore()***From:*"Impavido, Gregorio" <GImpavido@imf.org>

**Re: st: destring ignores more than what specified in ignore()***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: destring ignores more than what specified in ignore()***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: margins after oprobit with interactions** - Next by Date:
**Re: st: Conditional infile statements** - Previous by thread:
**RE: st: destring ignores more than what specified in ignore()** - Next by thread:
**Re: st: destring ignores more than what specified in ignore()** - Index(es):