[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to turn my date variable into a variable Stata.10 can recognise?

From   Michael Hanson <>
Subject   Re: st: How to turn my date variable into a variable Stata.10 can recognise?
Date   Thu, 19 Mar 2009 17:47:47 -0400

On Mar 19, 2009, at 4:13 PM, Ekaterina Hertog wrote:

I have got a dataset which contain dates of birth for individuals and these dates of birth look as follows: 19560413 and I am trying to turn them into date variables Stata can recognise.

To explore this issue, let's first create a simple toy dataset:

// Begin part 1 of example
input date_of_birth
// End part 1 of example

It is a numeric variable and I have turned it into string.

OK, but we can roll that step into those listed below, rather than create an extra variable that you likely won't need later anyway.

The problem is that the following approach:

gen birth_date = date(strbirth_date, "DMY")
format birth_date %td

does not work I just get missing values. Presumably that is because my date variable is not in the order: day - month - year, but rather year - month - day.

So then do not tell Stata to use the wrong order!  Consider:

// Begin part 2 of example
gen birth_date = date(string(date_of_birth,"%8.0f"),"YMD")
format birth_date %td
// End part 2 of example

Notice the use of "YMD" -- the order in while the date elements appear -- rather than "DMY". This is alluded to in -help dates_and_times- when the "mask" of the -date()- function is mentioned; since only one example ("MDY") is given for -date()-, one might be forgiven for thinking that other masks are not possible. Yet your attempted mask doesn't match the example in the help file... nor is it appropriate for your data.

I then thought I would redo the variable into a correct order and first tried to create 3 separate string variables out of each date: one for year, one for month and one for day.

I tried to do it as follows:
generate strbirth_date= string(date_of_birth, "%08.0f")
gen yob = substr(strbirth_date,1,4)
gen mob = substr(strbirth_date,5,6)
gen dob = substr(strbirth_date,7,8)

As a result 19560413 turned into: yob=1956

I do not understand why did the month of birth (mob) did not transform correctly and what can I do next.

Perhaps you thought Stata was Excel, or some other program(ming language) in which you specify the starting and ending characters for your substring extraction? But in -help string_functions-, it is clearly explained that the first number (n1) in -substr(s, n1, n2)- is the position from the start of the string, but the second number (n2) is the *length* of the substring. Hence, the correct way to extract the date elements you want is:

// Begin part 3 of example
gen yob = substr(string(date_of_birth,"%8.0f"),1,4)
gen mob = substr(string(date_of_birth,"%8.0f"),5,2)
gen dob = substr(string(date_of_birth,"%8.0f"),7,2)
// End part 3 of example

I would be very grateful for any advice as to how I can turn my date variable into a variable Stata10 can recognise,

The date (and string) functions in Stata are powerful, so they are worth learning. However, to use them correctly, there really is no substitute for reading the help files (or printed manuals) carefully.

Hope this helps,

P.S. The specification of the mask for the -date()- function has changed from lower case in Stata 9 and earlier to upper case in Stata 10 (and, I suspect, later). This can cause older programs that use the -date()- function, originally written for earlier versions of Stata, to misbehave or outright fail when run with Stata 10. A - version 9- command at the start of the program should remedy that situation, although I haven't checked.

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index