Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: preserving leading zeros in destring


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: preserving leading zeros in destring
Date   Wed, 27 Aug 2008 23:23:27 +0100

This question exemplifies a common but understandable confusion between
storage and display, another confusion on what -destring- is designed to
do, and another confusion about why you might want to use -todate-. 

Put on one side for the moment the detail of dates, which as usual
complicate things. 

Suppose I have a string variable with values like "012345". This example
is to Stata the character "0" followed by the character "1" and so
forth. Stata can be thought of as storing _and_ displaying it as such
(the details of electronics aside). 
What it means to any human while a string is immaterial and purely a
matter for human interpretation. That doesn't stop you manipulating it
in various ways within Stata, but Stata's way of thinking about it is
literal (literally). 

Now focus on the numeric interpretation. You should want to use
-destring- if and only if your string variable somehow contains a purely
numeric set of values. It became a string variable by some kind of
accident. Many of those accidents involve spreadsheets in one way or
another. 

That is, suppose your string variable contains values like "012345" or
"78901" which have numeric interpretations, and no other kinds of
values. 

Given that, you may want to unleash -destring- on values like "012345".
(If the numbers are integer identifiers, they are nevertheless often
better off as strings. U.S. Social Security numbers are a standard
example.) 

Now a distinction must be made. 

-destring- has one mission in life, to boldly go into the data universe
and seek out numbers and let their numberliness flourish. In this
example, it will see an integer 12345 and will store it as such, or
strictly its binary equivalent. 

That doesn't stop you separately applying a numeric -format- to the new
numeric variable and insisting that it should be displayed with a
leading zero. But, let me insist, that is a different matter. 

Apologies if that seems really elementary, but the distinction between
storage and display is often muddled. Some Stata users appear to think
that the format of a number affects how it is stored, whereas format
applies to display only. Although even programmers can usually forget
about it, this is one area where you have to keep remembering that
computers work with binary. You may ask for a display format with 3
decimal places, but that doesn't mean that the number is rounded to 3
decimal places and stored as such. 

Now to the details of Michael's question. 

First off, you _can_ apply -destring- to a string variable containing /
/ separators, but that is a bad idea. The / / are an important part of
the information in the string, so should not be thrown away, even if you
intend to put them back in some sense immediately thereafter. 

The best idea is to use -date()- to convert such a string variable to a
numeric date.  Phil Schumm has just explained that in a reply to
Michael's next question. Michael asked the same question on 23 August,
and Salah Mahmud gave the same reply. (Michael's question has probably
been bouncing around in cyberspace for a few days.) 

But there is confusion in the presumption that -destring- can preserve
somehow any leading zeros. -destring- is not about changing display
formats. If you had a date like "01/02/03" and you insisted on
-destring-ing it and removing the slashes, then -destring would map that
to 10203. You could then insist on a format with leading zero by using
-format-, but that's separate. Even if a leading zero numeric format
were applied, it would not affect any subsequent calculations with that
variable, as it has, as said, no effect on what is stored. 

Finally, Michael wants to push the resulting numeric variable back
through -todate-. -todate- is a user-written program on SSC. It had one
purpose only, to deal with run-together dates like 10203, meaning
1/02/03. For users of Stata 10, -todate- is now at last obsolete, as
StataCorp have caught up with run-together dates. (-todate- still has
some potential use with Stata 8 or Stata 9.) 

But why would Michael have run-together dates? Only because he just
removed the separators with -destring-. But as already said, he
shouldn't want to do that, because -date()- works perfectly well with
dates with separators (and always did from its introduction into Stata).
Even if Michael does not yet have Stata 10, -todate- has no use for him
unless his dates start out as run-together. 

In short, Michael should ignore -destring- and -todate-, and just use
-date()-, as others have also recommended. 

Nick 
n.j.cox@durham.ac.uk 

Michael McCulloch

I want to destring a variable, which contains dates in the format 
"MM/DD/YY", using:
	destring date, replace ignore("/") force

How can I set this to preserve the leading zeroes, so that I can follow
with:
	todate date, gen(newdate) pattern(mmddyy) f(%d)

Any suggestions would be much appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index