st: RE: preserving leading zeros in destring

Wed, 27 Aug 2008 23:23:27 +0100

This question exemplifies a common but understandable confusion between storage and display, another confusion on what -destring- is designed to do, and another confusion about why you might want to use -todate-. Put on one side for the moment the detail of dates, which as usual complicate things. Suppose I have a string variable with values like "012345". This example is to Stata the character "0" followed by the character "1" and so forth. Stata can be thought of as storing _and_ displaying it as such (the details of electronics aside). What it means to any human while a string is immaterial and purely a matter for human interpretation. That doesn't stop you manipulating it in various ways within Stata, but Stata's way of thinking about it is literal (literally). Now focus on the numeric interpretation. You should want to use -destring- if and only if your string variable somehow contains a purely numeric set of values. It became a string variable by some kind of accident. Many of those accidents involve spreadsheets in one way or another. That is, suppose your string variable contains values like "012345" or "78901" which have numeric interpretations, and no other kinds of values. Given that, you may want to unleash -destring- on values like "012345". (If the numbers are integer identifiers, they are nevertheless often better off as strings. U.S. Social Security numbers are a standard example.) Now a distinction must be made. -destring- has one mission in life, to boldly go into the data universe and seek out numbers and let their numberliness flourish. In this example, it will see an integer 12345 and will store it as such, or strictly its binary equivalent. That doesn't stop you separately applying a numeric -format- to the new numeric variable and insisting that it should be displayed with a leading zero. But, let me insist, that is a different matter. Apologies if that seems really elementary, but the distinction between storage and display is often muddled. Some Stata users appear to think that the format of a number affects how it is stored, whereas format applies to display only. Although even programmers can usually forget about it, this is one area where you have to keep remembering that computers work with binary. You may ask for a display format with 3 decimal places, but that doesn't mean that the number is rounded to 3 decimal places and stored as such. Now to the details of Michael's question. First off, you _can_ apply -destring- to a string variable containing / / separators, but that is a bad idea. The / / are an important part of the information in the string, so should not be thrown away, even if you intend to put them back in some sense immediately thereafter. The best idea is to use -date()- to convert such a string variable to a numeric date. Phil Schumm has just explained that in a reply to Michael's next question. Michael asked the same question on 23 August, and Salah Mahmud gave the same reply. (Michael's question has probably been bouncing around in cyberspace for a few days.) But there is confusion in the presumption that -destring- can preserve somehow any leading zeros. -destring- is not about changing display formats. If you had a date like "01/02/03" and you insisted on -destring-ing it and removing the slashes, then -destring would map that to 10203. You could then insist on a format with leading zero by using -format-, but that's separate. Even if a leading zero numeric format were applied, it would not affect any subsequent calculations with that variable, as it has, as said, no effect on what is stored. Finally, Michael wants to push the resulting numeric variable back through -todate-. -todate- is a user-written program on SSC. It had one purpose only, to deal with run-together dates like 10203, meaning 1/02/03. For users of Stata 10, -todate- is now at last obsolete, as StataCorp have caught up with run-together dates. (-todate- still has some potential use with Stata 8 or Stata 9.) But why would Michael have run-together dates? Only because he just removed the separators with -destring-. But as already said, he shouldn't want to do that, because -date()- works perfectly well with dates with separators (and always did from its introduction into Stata). Even if Michael does not yet have Stata 10, -todate- has no use for him unless his dates start out as run-together. In short, Michael should ignore -destring- and -todate-, and just use -date()-, as others have also recommended. Nick n.j.cox@durham.ac.uk Michael McCulloch I want to destring a variable, which contains dates in the format "MM/DD/YY", using: destring date, replace ignore("/") force How can I set this to preserve the leading zeroes, so that I can follow with: todate date, gen(newdate) pattern(mmddyy) f(%d) Any suggestions would be much appreciated. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

