Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: preserving leading zeros in destring


From   Michael McCulloch <[email protected]>
To   Statalist <[email protected]>
Subject   st: RE: preserving leading zeros in destring
Date   Thu, 28 Aug 2008 08:00:02 -0700

Thanks Nick for your explanation (copied below). My apologies, I hadn't been receiving Statalist and today searched the archives. I haven't yet solved my problem, and illustrate below how I've attempted to implement the suggestions so helpfully made for me:

My date field begins as a string in form of "12/12/08" and "4/4/08", created by:
clear
input str12 date
"12/12/08"
"12/02/08"
"4/4/08"
"4/14/08"
end
list

My problem is:
1. If I do first destring, -date- can't cope with the missing leading zeroes, as in:
* method 1: remove "/", then use -date-
gen date1=date
destring date1, replace ignore("/") force
tostring date1, replace
foreach D of varlist date1 {
generate `D'1 = date(`D', "mmddyy")
format `D'1 %d
drop `D'
rename `D'1 `D'
}
list

2. If I don't destring, -date- can't cope with the "/", as in:
* method 2: use -date- directly
gen date2=date(date, "MMDDYY", 2010)
format date2 %d
list

In each of these three examples, error msg is: (4 missing values generated).
And, in the fourth example, error msg is: (date: length does not match pattern)

3. If I do first destring, -todate- can't cope with the missing leading zeroes, as in:
* method 3: remove "/", then use -todate-
gen date3=date
destring date3, replace ignore("/") force
todate date, gen(date3a) pattern(mmddyy) f(%d) cend(2000)
list

4. If I don't first destring, -todate- can't cope with the "/", as in:
* method 4: use -todate- directly
gen date4=date
todate date, gen(date4a) pattern(mm/dd/yy) f(%d) cend(2000)
list


I apologize for my frustration; I seem to be learning more what *not* to do, rather than what *to* do.
Michael



****
Nick's full answer:

From "Nick Cox" <[email protected]>
To <[email protected]>
Subject st: RE: preserving leading zeros in destring
Date Wed, 27 Aug 2008 23:23:27 +0100

This question exemplifies a common but understandable confusion between
storage and display, another confusion on what -destring- is designed to
do, and another confusion about why you might want to use -todate-.

Put on one side for the moment the detail of dates, which as usual
complicate things.

Suppose I have a string variable with values like "012345". This example
is to Stata the character "0" followed by the character "1" and so
forth. Stata can be thought of as storing _and_ displaying it as such
(the details of electronics aside).
What it means to any human while a string is immaterial and purely a
matter for human interpretation. That doesn't stop you manipulating it
in various ways within Stata, but Stata's way of thinking about it is
literal (literally).

Now focus on the numeric interpretation. You should want to use
-destring- if and only if your string variable somehow contains a purely
numeric set of values. It became a string variable by some kind of
accident. Many of those accidents involve spreadsheets in one way or
another.

That is, suppose your string variable contains values like "012345" or
"78901" which have numeric interpretations, and no other kinds of
values.

Given that, you may want to unleash -destring- on values like "012345".
(If the numbers are integer identifiers, they are nevertheless often
better off as strings. U.S. Social Security numbers are a standard
example.)

Now a distinction must be made.

-destring- has one mission in life, to boldly go into the data universe
and seek out numbers and let their numberliness flourish. In this
example, it will see an integer 12345 and will store it as such, or
strictly its binary equivalent.

That doesn't stop you separately applying a numeric -format- to the new
numeric variable and insisting that it should be displayed with a
leading zero. But, let me insist, that is a different matter.

Apologies if that seems really elementary, but the distinction between
storage and display is often muddled. Some Stata users appear to think
that the format of a number affects how it is stored, whereas format
applies to display only. Although even programmers can usually forget
about it, this is one area where you have to keep remembering that
computers work with binary. You may ask for a display format with 3
decimal places, but that doesn't mean that the number is rounded to 3
decimal places and stored as such.

Now to the details of Michael's question.

First off, you _can_ apply -destring- to a string variable containing /
/ separators, but that is a bad idea. The / / are an important part of
the information in the string, so should not be thrown away, even if you
intend to put them back in some sense immediately thereafter.

The best idea is to use -date()- to convert such a string variable to a
numeric date. Phil Schumm has just explained that in a reply to
Michael's next question. Michael asked the same question on 23 August,
and Salah Mahmud gave the same reply. (Michael's question has probably
been bouncing around in cyberspace for a few days.)

But there is confusion in the presumption that -destring- can preserve
somehow any leading zeros. -destring- is not about changing display
formats. If you had a date like "01/02/03" and you insisted on
-destring-ing it and removing the slashes, then -destring would map that
to 10203. You could then insist on a format with leading zero by using
-format-, but that's separate. Even if a leading zero numeric format
were applied, it would not affect any subsequent calculations with that
variable, as it has, as said, no effect on what is stored.

Finally, Michael wants to push the resulting numeric variable back
through -todate-. -todate- is a user-written program on SSC. It had one
purpose only, to deal with run-together dates like 10203, meaning
1/02/03. For users of Stata 10, -todate- is now at last obsolete, as
StataCorp have caught up with run-together dates. (-todate- still has
some potential use with Stata 8 or Stata 9.)

But why would Michael have run-together dates? Only because he just
removed the separators with -destring-. But as already said, he
shouldn't want to do that, because -date()- works perfectly well with
dates with separators (and always did from its introduction into Stata).
Even if Michael does not yet have Stata 10, -todate- has no use for him
unless his dates start out as run-together.

In short, Michael should ignore -destring- and -todate-, and just use
-date()-, as others have also recommended.

Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index