Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: destring ignores more than what specified in ignore()


From   "Impavido, Gregorio" <[email protected]>
To   "'STATALIST ([email protected])'" <[email protected]>
Subject   st: destring ignores more than what specified in ignore()
Date   Sun, 20 Nov 2011 20:51:55 -0500

I looked at the many FAQ on destring but could not find an answer for my problem.  Hence, the post and hopefully, it is not a  duplicate.
                                                                                                                                                                                                            
I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.".

These variables should be numeric and I would like to destring them by coding:

foreach var of varlist * {
   capture confirm numeric variable `var'
   if _rc {
      destring `var', replace ignore("n.a." "n.s.")
      }
}
 
This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.". 
                                                                                                                                                                                                                                                         
Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######.   Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final ".").


First question (of two):  Why is destring ignoring more things than what specified in the option ignore()?

I found two ways around this odd behaviour of destring.

The first option uses an extra line of code and it is:

foreach var of varlist * {
   capture confirm numeric variable `var'
   if _rc {
      replace `var' = "na" if inlist(`var', "n.a.", "n.s.")  // this gets rid of the "."
      destring `var', replace ignore("na")  // no "." here!!!
   }
}

This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore).

The second option uses generate with the real() function but also more lines of code as real() does not work with replace.

foreach var of varlist * {
   capture confirm numeric variable `var'
   if _rc {
      replace `var' = "." if inlist(`var', "n.a.", "n.s.")
      local lbl : variable label `var'
      gen `var'r = real(`var')
      label var `var'r `"`lbl'"'
      order `var'r, after(`var')
      drop `var'
   }
}

Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)?

Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)?

Thanks in advance for suggestions

Gregorio


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index