[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: turning numbers into dates |

Date |
Mon, 4 May 2009 18:32:16 +0100 |

Some room still remains for comment. Ekaterina started the thread with the implication that her date variable was numeric, and finished with an example in which it is string. Let -date- be what you start with. I'll look at both possibilities. Further solutions ================= -date- is string: gen sdate = date if length(date) == 6 replace sdate = "20" + date if missing(sdate) & date < "1000" replace sdate = "19" + date if missing(sdate) gen ndate = date(sdate, "YM") -date- is numeric: gen ndate = date if date >= 10000 replace ndate = 200000 + date if missing(ndate) & date < 1000 replace ndate = 190000 + date if missing(ndate) replace ndate = date(string(ndate), "YM") This solution is (deliberately) pedestrian, meaning it takes short simple steps. Various remarks =============== 1. I like -tostring- as much as most of its users but it is not needed here. Using -tostring- to create a variable you only want briefly is unnecessary when -string()- will do the same thing on the fly. 2. Eric used expressions like varold - (int(varold/10000)*10000) Note that this is just -mod(varold, 10000)-. 3. Martin cited a piece in the Stata Journal on -cond()-. Note that the first author is David Kantor. 4. Tirthankar downloaded -catenate- from SSC (which he didn't use) and -todate- ditto (which he did). Note that -catenate- is made obsolete by -egen, concat()- (unless you have Stata 5 or Stata 6) while -todate- is made obsolete by Stata 10's -date()-. Nick n.j.cox@durham.ac.uk Ekaterina Hertog (2) ==================== thank you so very much for all the brilliant advice! I have tried different options suggested by Eric, Martin and Tirthankar and finally opted for a slightly modified version of Martin's suggestion as it seemed it would be most flexible, even allowing em to take into account the fact that some of my dates come from 20th and some for the 21st century (I could do it with the topyear option) Just in case you might be interested this is what I did: clear input str10 stredu1st 200109 197104 197504 196504 196904 8804 8404 0202 6304 8304 end compress gen mydate= cond(length(stredu1st)>4, stredu1st, substr(stredu1st,1,2)+" "+substr(stredu1st,3,4)) gen edu1st = cond(length(stredu1st)>4, date(mydate, "YM"), date(mydate, "YM" , 2007)) format edu1st %td list stredu1st mydate edu1st, noobs Eric de Souza (2) ================= Wouldn't the following be much simpler for her? replacing varnew by whatever her date variable is. input varold 199804 199702 8705 7502 end gen varnew = varold - (int(varold/10000)*10000) gen year = 1900 + int(varnew/100) gen month = varnew - (int(varnew/100)*100) gen date = ym(year, month) format %tm date Or is there a problem with the above? Martin Weiss (2) =============== So it seems Ekaterina can employ several methods, based on the -cond- function. She can read up on best practice with regard to this function in Nick`s column http://www.stata-journal.com/sjpdf.html?articlenum=pr0016 If she is willing to make sure that all her two digit years hail from the 20th century, she could ************* clear* input edu_start_date_1 // :mylabel , auto // str10 double byte 197104 197504 196504 196904 8804 8404 6304 8304 end compress tostring edu_start_date_1, gen(stredu1st) gen mydate=cond(length(stredu1st)>4, /// stredu1st, /// substr(stredu1st,1,2)+" "+substr(stredu1st,3,4)) gen edu1st = cond(length(stredu1st)>4, /// date(mydate, "YM"), /// date(mydate, "19YM")) format edu1st %td list edu_start_date_1 mydate edu1st, noobs ************* force a hole into the string after the year digits, and the date function would understand her. Alternatively, she could make the same assumption and add the "19" to all four digit strings: ****************** clear* input edu_start_date_1 // :mylabel , auto // str10 double byte 197104 197504 196504 196904 8804 8404 6304 8304 end compress tostring edu_start_date_1, gen(stredu1st) gen mydate=cond(length(stredu1st)>4, /// stredu1st, /// "19"+stredu1st) gen edu1st = date(mydate, "YM") format edu1st %td list edu_start_date_1 mydate edu1st, noobs ****************** She should check very carefully whether the results match her expectations :-) Eric de Souza (1) ================= And -di %td date("0675", "M19Y")- works Martin Weiss (1) ================ The -cond- function would let you condition on the length of the string so you could tailor the command: -gen mydate = cond(length(stredu1st)>4, ... )- I would love to fill in the remainder of this command but I just cannot figure out at the moment why -di %td date("197506", "YM")- works and -di %td date("7506", "19YM")- does not... Tirthankar Chakravarty ====================== A fairly crude hack follows. I do this two ways to check that there is no inconsistency. Then there is a slightly simpler solution below: /* Begin */ clear input str10 date 197104 197504 196504 196904 8804 8404 6304 8304 end g date1=trim(date) g newvar=substr(date1, 3,.) if length(date1)>4 replace newvar=date1 if length(date1)<5 // or: g date2 = "19" ssc install catenate, replace g str10 newvar2 = date2+date1 if length(date1)<5 replace newvar2=date1 if length(date1)>4 drop date1 date2 ssc install todate, replace todate newvar2, gen(edu1st_1) pattern(yyyymm) todate newvar, gen(edu1st_2) pattern(yymm) cend(2000) assert edu1st_1==edu1st_2 drop newvar* list edu1st* /* End */ A simpler option could be to run NJ Cox's -todate- twice and then combine the two output variables - some manipulation using -length()- will be required in this case as well. /* Simple option */ clear input str10 date 197104 197504 196504 196904 8804 8404 6304 8304 end todate date if length(date)>4, gen(edu1st_1) pattern(yyyymm) todate date if length(date)<5, gen(edu1st_2) pattern(yymm) cend(2000) replace edu1st_1=edu1st_2 if(edu1st_1==.) drop edu1st_2 rename edu1st_1 edu1st list edu1st /* End */ Ekaterina Hertog (1) ==================== I have got a variable containing the month and year an individual started his or her education. Only Stata thinks the values in this variable are numbers and I want to turn them into dates. If all the numbers followed the same pattern that will not be a problem. for example I could do it like this: tostring edu_start_date_1, gen(stredu1st) gen edu1st = date(stredu1st, "YM") My problem is that while most dates in my dataset come in the yyyymm pattern: e.g. +----------+ | stredu~t | |----------| 1. | . | 2. | 197104 | 3. | 197504 | 4. | 196504 | 5. | 196904 | |----------| several contain only yymm e.g. +-----------+ | edu_st~1 | |-----------| 12338. | 8804 | 13265. | 8404 | 13666. | 6304 | 13831. | 8304 | +-----------+ So when I run gen edu1st = date(stredu1st, "YM") all the yymm values in stredu1st are turned into missing values in edu1st. I could of course edit the values containing only yymm into yyyymm pattern manually, but this feels imprecise and prone to error and I would like to automate the process if at all possible. Is there a way to make the date command recognise alternative patterns? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: turning numbers into dates***From:*Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>

- Prev by Date:
**st: RE: an ROC equivalent for continuous variables?** - Next by Date:
**st: Uniform Confidence Band** - Previous by thread:
**Re: st: AW: turning numbers into dates** - Next by thread:
**Re: st: graph hbox Y, by(something noiyaxes) over(grp2) over(grp1)** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |