Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: preserving leading zeros in destring


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: preserving leading zeros in destring
Date   Thu, 28 Aug 2008 16:44:47 +0100

I've just seen your other post, in which you explain that you are using
Stata 9.2. 

To repeat longstanding advice, posters are asked to specify --
implicitly always -- whenever they are using an out-of-date version of
Stata.  

The syntax of -date()- changed between 9 and 10. You need to feed, to
-generate- or -replace-, 

date(date, "mdy", 2010)

Otherwise my analysis is the same. 

Nick 
n.j.cox@durham.ac.uk 

-----Original Message-----
From: Nick Cox 
Sent: 28 August 2008 16:36
To: 'statalist@hsphsun2.harvard.edu'
Subject: RE: RE: preserving leading zeros in destring

Thanks for the further details. With your example data I comment further
as follows. 

The good news is in 2. You just didn't find the correct syntax for
-date()-. 
Your problem is perfectly soluble so long as you give the correct
syntax. 

1. "If I do first destring, -date- can't cope with the missing leading 
zeroes" 

As earlier explained, -destring- is a bad idea for dates. Without
parental bias, I can say confidently that this is not a deficiency of
-destring-. It is not designed for your problem. By throwing away
crucial information you make a bigger problem for yourself. 

Incidentally, as a matter of Stata technique, just removing / s from a
string variable is better done by 

gen newstrvar = subinstr(strvar, "/", "", .) 

2. "If I don't destring, -date- can't cope with the "/" " 

Try 

date(date, "MDY", 2010)

which works fine. 

3. "If I do first destring, -todate- can't cope with the missing 
leading zeroes" 

Correct, but as the pattern you fed to -todate- doesn't match your data,
that is not an indictment of -todate-. See also 1. 

4. "If I don't first destring, -todate- can't cope with the "/"" 

Correct, but as -todate- is only for run-together dates, i.e. without
separators, and you specified an illegal -pattern()- that is an abuse or
misunderstanding of -todate-. 

Nick 
n.j.cox@durham.ac.uk 

Michael McCulloch

Thanks Nick for your explanation (copied below). My apologies, I 
hadn't been receiving Statalist and today searched the archives. I 
haven't yet solved my problem, and illustrate below how I've 
attempted to implement the suggestions so helpfully made for me:

My date field begins as a string in form of  "12/12/08" and "4/4/08", 
created by:
	clear
	input str12 date
	"12/12/08"
	"12/02/08"
	"4/4/08"
	"4/14/08"
	end
	list

My problem is:
1. If I do first destring, -date- can't cope with the missing leading 
zeroes, as in:
* method 1: remove "/", then use -date-
gen date1=date
destring date1, replace ignore("/") force
tostring date1, replace
    foreach D of varlist date1 {
      generate `D'1 = date(`D', "mmddyy")
      format `D'1 %d
      drop `D'
      rename `D'1 `D'
    }
list

2. If I don't destring, -date- can't cope with the "/", as in:
* method 2: use -date- directly
gen date2=date(date, "MMDDYY", 2010)
format date2 %d
list

In each of these three examples, error msg is: (4 missing values
generated).
And, in the fourth example, error msg is: (date: length does not match
pattern)

3. If I do first destring, -todate- can't cope with the missing 
leading zeroes, as in:
* method 3: remove "/", then use -todate-
gen date3=date
destring date3, replace ignore("/") force
todate date, gen(date3a) pattern(mmddyy) f(%d) cend(2000)
list

4. If I don't first destring, -todate- can't cope with the "/", as in:
* method 4: use -todate- directly
gen date4=date
todate date, gen(date4a) pattern(mm/dd/yy) f(%d) cend(2000)
list


I apologize for my frustration; I seem to be learning more what *not* 
to do, rather than what *to* do.
Michael



****
Nick's full answer:

 From 	  "Nick Cox" <n.j.cox@durham.ac.uk>
To 	  <statalist@hsphsun2.harvard.edu>
Subject 	  st: RE: preserving leading zeros in destring
Date 	  Wed, 27 Aug 2008 23:23:27 +0100

This question exemplifies a common but understandable confusion between
storage and display, another confusion on what -destring- is designed to
do, and another confusion about why you might want to use -todate-.

Put on one side for the moment the detail of dates, which as usual
complicate things.

Suppose I have a string variable with values like "012345". This example
is to Stata the character "0" followed by the character "1" and so
forth. Stata can be thought of as storing _and_ displaying it as such
(the details of electronics aside).
What it means to any human while a string is immaterial and purely a
matter for human interpretation. That doesn't stop you manipulating it
in various ways within Stata, but Stata's way of thinking about it is
literal (literally).

Now focus on the numeric interpretation. You should want to use
-destring- if and only if your string variable somehow contains a purely
numeric set of values. It became a string variable by some kind of
accident. Many of those accidents involve spreadsheets in one way or
another.

That is, suppose your string variable contains values like "012345" or
"78901" which have numeric interpretations, and no other kinds of
values.

Given that, you may want to unleash -destring- on values like "012345".
(If the numbers are integer identifiers, they are nevertheless often
better off as strings. U.S. Social Security numbers are a standard
example.)

Now a distinction must be made.

-destring- has one mission in life, to boldly go into the data universe
and seek out numbers and let their numberliness flourish. In this
example, it will see an integer 12345 and will store it as such, or
strictly its binary equivalent.

That doesn't stop you separately applying a numeric -format- to the new
numeric variable and insisting that it should be displayed with a
leading zero. But, let me insist, that is a different matter.

Apologies if that seems really elementary, but the distinction between
storage and display is often muddled. Some Stata users appear to think
that the format of a number affects how it is stored, whereas format
applies to display only. Although even programmers can usually forget
about it, this is one area where you have to keep remembering that
computers work with binary. You may ask for a display format with 3
decimal places, but that doesn't mean that the number is rounded to 3
decimal places and stored as such.

Now to the details of Michael's question.

First off, you _can_ apply -destring- to a string variable containing /
/ separators, but that is a bad idea. The / / are an important part of
the information in the string, so should not be thrown away, even if you
intend to put them back in some sense immediately thereafter.

The best idea is to use -date()- to convert such a string variable to a
numeric date.  Phil Schumm has just explained that in a reply to
Michael's next question. Michael asked the same question on 23 August,
and Salah Mahmud gave the same reply. (Michael's question has probably
been bouncing around in cyberspace for a few days.)

But there is confusion in the presumption that -destring- can preserve
somehow any leading zeros. -destring- is not about changing display
formats. If you had a date like "01/02/03" and you insisted on
-destring-ing it and removing the slashes, then -destring would map that
to 10203. You could then insist on a format with leading zero by using
-format-, but that's separate. Even if a leading zero numeric format
were applied, it would not affect any subsequent calculations with that
variable, as it has, as said, no effect on what is stored.

Finally, Michael wants to push the resulting numeric variable back
through -todate-. -todate- is a user-written program on SSC. It had one
purpose only, to deal with run-together dates like 10203, meaning
1/02/03. For users of Stata 10, -todate- is now at last obsolete, as
StataCorp have caught up with run-together dates. (-todate- still has
some potential use with Stata 8 or Stata 9.)

But why would Michael have run-together dates? Only because he just
removed the separators with -destring-. But as already said, he
shouldn't want to do that, because -date()- works perfectly well with
dates with separators (and always did from its introduction into Stata).
Even if Michael does not yet have Stata 10, -todate- has no use for him
unless his dates start out as run-together.

In short, Michael should ignore -destring- and -todate-, and just use
-date()-, as others have also recommended.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index