Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Impavido, Gregorio" <GImpavido@imf.org> |

To |
"'STATALIST (statalist@hsphsun2.harvard.edu)'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: destring ignores more than what specified in ignore() |

Date |
Sun, 20 Nov 2011 20:51:55 -0500 |

I looked at the many FAQ on destring but could not find an answer for my problem. Hence, the post and hopefully, it is not a duplicate. I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.". These variables should be numeric and I would like to destring them by coding: foreach var of varlist * { capture confirm numeric variable `var' if _rc { destring `var', replace ignore("n.a." "n.s.") } } This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.". Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######. Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final "."). First question (of two): Why is destring ignoring more things than what specified in the option ignore()? I found two ways around this odd behaviour of destring. The first option uses an extra line of code and it is: foreach var of varlist * { capture confirm numeric variable `var' if _rc { replace `var' = "na" if inlist(`var', "n.a.", "n.s.") // this gets rid of the "." destring `var', replace ignore("na") // no "." here!!! } } This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore). The second option uses generate with the real() function but also more lines of code as real() does not work with replace. foreach var of varlist * { capture confirm numeric variable `var' if _rc { replace `var' = "." if inlist(`var', "n.a.", "n.s.") local lbl : variable label `var' gen `var'r = real(`var') label var `var'r `"`lbl'"' order `var'r, after(`var') drop `var' } } Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)? Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)? Thanks in advance for suggestions Gregorio * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: destring ignores more than what specified in ignore()***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Fitstat Command usage** - Next by Date:
**Re: st: means compairison with weights and unequal variance** - Previous by thread:
**st: Fitstat Command usage** - Next by thread:
**Re: st: destring ignores more than what specified in ignore()** - Index(es):