[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Michael McCulloch <mm@pinest.org> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: RE: preserving leading zeros in destring |

Date |
Thu, 28 Aug 2008 09:48:12 -0700 |

To summarize, Nick Cox suggested below a solution to my problem, which was that in Stata 9.2, after importing dates from Excel.csv in the format "MM/DD/YY", I couldn't get -date- to work.

Here's the solution, succinctly, in which an Excel spreadsheet saved in *.csv format contains a variable called date, formatted with Excel date format MM/DD/YY:

insheet using "my_sheet.csv", clear

gen newdate= date(date, "mdy", 2010)

format newdate %d

list

My apologies for not having specified I'm using Stata 9.2

As a matter of Statalist procedure, in a case like this where the answer isn't reflected in the title of my original post, is it Kosher to re-submit the problem & solution in a newly titled post, so others may find it more easily when searching the Statalist archives?

Many thanks.

Michael

Thanks for the further details. With your example data I comment further as follows. The good news is in 2. You just didn't find the correct syntax for -date()-. Your problem is perfectly soluble so long as you give the correct syntax. 1. "If I do first destring, -date- can't cope with the missing leading zeroes" As earlier explained, -destring- is a bad idea for dates. Without parental bias, I can say confidently that this is not a deficiency of -destring-. It is not designed for your problem. By throwing away crucial information you make a bigger problem for yourself. Incidentally, as a matter of Stata technique, just removing / s from a string variable is better done by gen newstrvar = subinstr(strvar, "/", "", .) 2. "If I don't destring, -date- can't cope with the "/" " Try date(date, "MDY", 2010) which works fine. 3. "If I do first destring, -todate- can't cope with the missing leading zeroes" Correct, but as the pattern you fed to -todate- doesn't match your data, that is not an indictment of -todate-. See also 1. 4. "If I don't first destring, -todate- can't cope with the "/"" Correct, but as -todate- is only for run-together dates, i.e. without separators, and you specified an illegal -pattern()- that is an abuse or misunderstanding of -todate-. Nick n.j.cox@durham.ac.uk Michael McCulloch Thanks Nick for your explanation (copied below). My apologies, I hadn't been receiving Statalist and today searched the archives. I haven't yet solved my problem, and illustrate below how I've attempted to implement the suggestions so helpfully made for me: My date field begins as a string in form of "12/12/08" and "4/4/08", created by: clear input str12 date "12/12/08" "12/02/08" "4/4/08" "4/14/08" end list My problem is: 1. If I do first destring, -date- can't cope with the missing leading zeroes, as in: * method 1: remove "/", then use -date- gen date1=date destring date1, replace ignore("/") force tostring date1, replace foreach D of varlist date1 { generate `D'1 = date(`D', "mmddyy") format `D'1 %d drop `D' rename `D'1 `D' } list 2. If I don't destring, -date- can't cope with the "/", as in: * method 2: use -date- directly gen date2=date(date, "MMDDYY", 2010) format date2 %d list In each of these three examples, error msg is: (4 missing values generated). And, in the fourth example, error msg is: (date: length does not match pattern) 3. If I do first destring, -todate- can't cope with the missing leading zeroes, as in: * method 3: remove "/", then use -todate- gen date3=date destring date3, replace ignore("/") force todate date, gen(date3a) pattern(mmddyy) f(%d) cend(2000) list 4. If I don't first destring, -todate- can't cope with the "/", as in: * method 4: use -todate- directly gen date4=date todate date, gen(date4a) pattern(mm/dd/yy) f(%d) cend(2000) list I apologize for my frustration; I seem to be learning more what *not* to do, rather than what *to* do. Michael **** Nick's full answer: From "Nick Cox" <n.j.cox@durham.ac.uk> To <statalist@hsphsun2.harvard.edu> Subject st: RE: preserving leading zeros in destring Date Wed, 27 Aug 2008 23:23:27 +0100 This question exemplifies a common but understandable confusion between storage and display, another confusion on what -destring- is designed to do, and another confusion about why you might want to use -todate-. Put on one side for the moment the detail of dates, which as usual complicate things. Suppose I have a string variable with values like "012345". This example is to Stata the character "0" followed by the character "1" and so forth. Stata can be thought of as storing _and_ displaying it as such (the details of electronics aside). What it means to any human while a string is immaterial and purely a matter for human interpretation. That doesn't stop you manipulating it in various ways within Stata, but Stata's way of thinking about it is literal (literally). Now focus on the numeric interpretation. You should want to use -destring- if and only if your string variable somehow contains a purely numeric set of values. It became a string variable by some kind of accident. Many of those accidents involve spreadsheets in one way or another. That is, suppose your string variable contains values like "012345" or "78901" which have numeric interpretations, and no other kinds of values. Given that, you may want to unleash -destring- on values like "012345". (If the numbers are integer identifiers, they are nevertheless often better off as strings. U.S. Social Security numbers are a standard example.) Now a distinction must be made. -destring- has one mission in life, to boldly go into the data universe and seek out numbers and let their numberliness flourish. In this example, it will see an integer 12345 and will store it as such, or strictly its binary equivalent. That doesn't stop you separately applying a numeric -format- to the new numeric variable and insisting that it should be displayed with a leading zero. But, let me insist, that is a different matter. Apologies if that seems really elementary, but the distinction between storage and display is often muddled. Some Stata users appear to think that the format of a number affects how it is stored, whereas format applies to display only. Although even programmers can usually forget about it, this is one area where you have to keep remembering that computers work with binary. You may ask for a display format with 3 decimal places, but that doesn't mean that the number is rounded to 3 decimal places and stored as such. Now to the details of Michael's question. First off, you _can_ apply -destring- to a string variable containing / / separators, but that is a bad idea. The / / are an important part of the information in the string, so should not be thrown away, even if you intend to put them back in some sense immediately thereafter. The best idea is to use -date()- to convert such a string variable to a numeric date. Phil Schumm has just explained that in a reply to Michael's next question. Michael asked the same question on 23 August, and Salah Mahmud gave the same reply. (Michael's question has probably been bouncing around in cyberspace for a few days.) But there is confusion in the presumption that -destring- can preserve somehow any leading zeros. -destring- is not about changing display formats. If you had a date like "01/02/03" and you insisted on -destring-ing it and removing the slashes, then -destring would map that to 10203. You could then insist on a format with leading zero by using -format-, but that's separate. Even if a leading zero numeric format were applied, it would not affect any subsequent calculations with that variable, as it has, as said, no effect on what is stored. Finally, Michael wants to push the resulting numeric variable back through -todate-. -todate- is a user-written program on SSC. It had one purpose only, to deal with run-together dates like 10203, meaning 1/02/03. For users of Stata 10, -todate- is now at last obsolete, as StataCorp have caught up with run-together dates. (-todate- still has some potential use with Stata 8 or Stata 9.) But why would Michael have run-together dates? Only because he just removed the separators with -destring-. But as already said, he shouldn't want to do that, because -date()- works perfectly well with dates with separators (and always did from its introduction into Stata). Even if Michael does not yet have Stata 10, -todate- has no use for him unless his dates start out as run-together. In short, Michael should ignore -destring- and -todate-, and just use -date()-, as others have also recommended. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: RE: preserving leading zeros in destring***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**st: RE: preserving leading zeros in destring***From:*Michael McCulloch <mm@pinest.org>

**st: RE: RE: preserving leading zeros in destring***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: How do I create a new dataset with just one variable but 500,000 obs?** - Next by Date:
**Re: st: RE: Egen command** - Previous by thread:
**st: RE: RE: preserving leading zeros in destring** - Next by thread:
**RE: st: RE: RE: preserving leading zeros in destring** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |