How do I convert date variables into Stata elapsed dates when the numbers
run together, like 4151999?
|
Title
|
|
Handling date information when numbers are in a continuous sequence
|
|
Author
|
David Reichel, StataCorp
|
|
Date
|
December 1999; updated August 2007
|
The date() function
can convert virtually any date format into elapsed dates, which is
the format Stata uses to manipulate date information. Elapsed dates
are calculated as the number of days from January 1, 1960. This format is
useful for adding or subtracting dates and changing the format of date
variables.
Stata 10
In Stata 10, converting dates is easy to do. The improved date() function can
handle a date that is a continuous number. For example, suppose you
had the numeric variable datevar:
. list
+----------+
| datevar |
|----------|
1. | 12031999 |
2. | 2081998 |
3. | 4071997 |
+----------+
To convert the above date variable to an elapse date format, you must first
convert the numeric variable to a string and then use the date()
function. For example,
. tostring datevar, replace format(%20.0f)
datevar was float now str8
. replace datevar = "0" + datevar if length(datevar) == 7
(2 real changes made)
. list
+----------+
| datevar |
|----------|
1. | 12031999 |
2. | 02081998 |
3. | 04071997 |
+----------+
. gen edatevar = date(datevar,"MDY")
. format edatevar %td
. list
+----------------------+
| datevar edatevar |
|----------------------|
1. | 12031999 03dec1999 |
2. | 02081998 08feb1998 |
3. | 04071997 07apr1997 |
+----------------------+
Stata 9
When month, day, and year are in the form of a continuous string
of numbers, such as 11021999, with no spaces, hyphens, or slash marks
to separate the numbers, Stata 9 is not able to read the date information
directly.
If you have a numeric date variable in this format, the
int() function can be
used to separate the date information into three separate variables that can
then be used in the
mdy() function to
read the data. The following example demonstrates how you may perform such
a conversion when you have a four-digit year.
Suppose you have a dataset named datecal that has a variable
named datevar:
. list
+----------+
| datevar |
|----------|
1. | 12031999 |
2. | 2081998 |
3. | 4071997 |
+----------+
Your program could be
use datecal, clear
generate month = int(datevar/1000000)
generate day = int((datevar - month*1000000)/10000)
generate year = datevar - month*1000000 - day*10000
generate elapdate = mdy(month, day, year)
list
The output from this program is
+------------------------------------------+
| datevar month day year elapdate |
|------------------------------------------|
| 12031999 12 3 1999 14581 |
| 2081998 2 8 1998 13918 |
| 4071997 4 7 1997 13611 |
+------------------------------------------+
To format elapdate, there are several options, but one possibility is
. format elapdate %d
. list elapdate
+-----------+
| elapdate |
|-----------|
1. | 03dec1999 |
2. | 08feb1998 |
3. | 07apr1997 |
+-----------+
If you have the same date variable in string (as opposed to numeric) format,
you can perform the same separation into three variables of month, day, and
year using a combination of string manipulation functions. For example,
let us assume you have the same three dates as in datevar
except they are in a string.
. list
+----------+
| str_date |
|----------|
1. | 12031999 |
2. | 2081998 |
3. | 4071997 |
+----------+
One solution would be to extract the month, day, and year
information from the string positions using the
substr() function. You
can also convert the string into a numeric format at the same time using the
real() function.
Because you have dates with 7 characters and 8 characters, you will need to
use separate commands to obtain the results as follows:
. gen month = real(substr(str_date,1,2)) if strlen(str_date)==8
(2 missing values generated)
. replace month = real(substr(str_date,1,1)) if strlen(str_date)==7
(2 real changes made)
. gen date = real(substr(str_date,3,2)) if strlen(str_date)==8
(2 missing values generated)
. replace date = real(substr(str_date,2,2)) if strlen(str_date)==7
(2 real changes made)
. gen year = real(substr(str_date,5,4)) if strlen(str_date)==8
(2 missing values generated)
. replace year = real(substr(str_date,4,4)) if strlen(str_date)==7
(2 real changes made)
. list
+--------------------------------+
| str_date month date year |
|--------------------------------|
1. | 12031999 12 3 1999 |
2. | 2081998 2 8 1998 |
3. | 4071997 4 7 1997 |
+--------------------------------+
. describe
Contains data from datecalstr.dta
obs: 3
vars: 4 14 Apr 2005 12:09
size: 72 (99.9% of memory free)
------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------
str_date str8 %9s
month float %9.0g
date float %9.0g
year float %9.0g
------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
Another solution is to use the todate command
written by Nicholas J. Cox. To obtain todate, type
. ssc install todate
|