Title | Handling date information in Stata 9 or earlier when numbers are in a continuous sequence | |
Author | David Reichel, StataCorp |
The date() function can convert virtually any date format into elapsed dates, which is the format Stata uses to manipulate date information. Elapsed dates are calculated as the number of days from January 1, 1960. This format is useful for adding or subtracting dates and changing the format of date variables.
When month, day, and year are in the form of a continuous string of numbers, such as 11021999, with no spaces, hyphens, or slash marks to separate the numbers, Stata 9 or earlier is not able to read the date information directly.
If you have a numeric date variable in this format, the int() function can be used to separate the date information into three separate variables that can then be used in the mdy() function to read the data. The following example demonstrates how you may perform such a conversion when you have a four-digit year.
Suppose you have a dataset named datecal that has a variable named datevar:
. list +----------+ | datevar | |----------| 1. | 12031999 | 2. | 2081998 | 3. | 4071997 | +----------+
Your program could be
use datecal, clear generate month = int(datevar/1000000) generate day = int((datevar - month*1000000)/10000) generate year = datevar - month*1000000 - day*10000 generate elapdate = mdy(month, day, year) list
The output from this program is
+------------------------------------------+ | datevar month day year elapdate | |------------------------------------------| | 12031999 12 3 1999 14581 | | 2081998 2 8 1998 13918 | | 4071997 4 7 1997 13611 | +------------------------------------------+
To format elapdate, there are several options, but one possibility is
. format elapdate %d . list elapdate +-----------+ | elapdate | |-----------| 1. | 03dec1999 | 2. | 08feb1998 | 3. | 07apr1997 | +-----------+
If you have the same date variable in string (as opposed to numeric) format, you can perform the same separation into three variables of month, day, and year using a combination of string manipulation functions. For example, let us assume you have the same three dates as in datevar except they are in a string.
. list +----------+ | str_date | |----------| 1. | 12031999 | 2. | 2081998 | 3. | 4071997 | +----------+
One solution would be to extract the month, day, and year information from the string positions using the substr() function. You can also convert the string into a numeric format at the same time using the real() function. Because you have dates with 7 characters and 8 characters, you will need to use separate commands to obtain the results as follows:
. gen month = real(substr(str_date,1,2)) if strlen(str_date)==8 (2 missing values generated) . replace month = real(substr(str_date,1,1)) if strlen(str_date)==7 (2 real changes made) . gen date = real(substr(str_date,3,2)) if strlen(str_date)==8 (2 missing values generated) . replace date = real(substr(str_date,2,2)) if strlen(str_date)==7 (2 real changes made) . gen year = real(substr(str_date,5,4)) if strlen(str_date)==8 (2 missing values generated) . replace year = real(substr(str_date,4,4)) if strlen(str_date)==7 (2 real changes made) . list +--------------------------------+ | str_date month date year | |--------------------------------| 1. | 12031999 12 3 1999 | 2. | 2081998 2 8 1998 | 3. | 4071997 4 7 1997 | +--------------------------------+ . describe Contains data from datecalstr.dta obs: 3 vars: 4 14 Apr 2005 12:09 size: 72 (99.9% of memory free) ------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------ str_date str8 %9s month float %9.0g date float %9.0g year float %9.0g ------------------------------------------------------------------------ Sorted by: Note: dataset has changed since last saved
Another solution is to use the todate command written by Nicholas J. Cox. To obtain todate, type
. ssc install todate