Search
   >> Home >> Resources & support >> FAQs >> Handling date information when numbers are in a continuous sequence

How do I convert date variables into Stata elapsed dates when the numbers run together, like 4151999?

Title   Handling date information when numbers are in a continuous sequence
Author David Reichel, StataCorp
Date December 1999; updated August 2007

The date() function can convert virtually any date format into elapsed dates, which is the format Stata uses to manipulate date information. Elapsed dates are calculated as the number of days from January 1, 1960. This format is useful for adding or subtracting dates and changing the format of date variables.

Stata 10

In Stata 10, converting dates is easy to do. The improved date() function can handle a date that is a continuous number. For example, suppose you had the numeric variable datevar:

 . list

      +----------+
      |  datevar |
      |----------|
   1. | 12031999 |
   2. |  2081998 |
   3. |  4071997 |
      +----------+

To convert the above date variable to an elapse date format, you must first convert the numeric variable to a string and then use the date() function. For example,

 . tostring datevar, replace format(%20.0f)
 datevar was float now str8

 . replace datevar = "0" + datevar if length(datevar) == 7
 (2 real changes made)

 . list

      +----------+
      |  datevar |
      |----------|
   1. | 12031999 |
   2. | 02081998 |
   3. | 04071997 |
      +----------+
 
 . gen edatevar = date(datevar,"MDY")

 . format edatevar %td

 . list

      +----------------------+
      |  datevar    edatevar |
      |----------------------|
   1. | 12031999   03dec1999 |
   2. | 02081998   08feb1998 |
   3. | 04071997   07apr1997 |
      +----------------------+

Stata 9

When month, day, and year are in the form of a continuous string of numbers, such as 11021999, with no spaces, hyphens, or slash marks to separate the numbers, Stata 9 is not able to read the date information directly.

If you have a numeric date variable in this format, the int() function can be used to separate the date information into three separate variables that can then be used in the mdy() function to read the data. The following example demonstrates how you may perform such a conversion when you have a four-digit year.

Suppose you have a dataset named datecal that has a variable named datevar:

 . list

      +----------+
      |  datevar |
      |----------|
   1. | 12031999 |
   2. |  2081998 |
   3. |  4071997 |
      +----------+

Your program could be

 use datecal, clear
 generate month = int(datevar/1000000)
 generate day = int((datevar - month*1000000)/10000)
 generate year = datevar - month*1000000 - day*10000
 generate elapdate = mdy(month, day, year)
 list  

The output from this program is

 +------------------------------------------+
 |  datevar   month   day   year   elapdate |
 |------------------------------------------|
 | 12031999      12     3   1999      14581 |
 |  2081998       2     8   1998      13918 |
 |  4071997       4     7   1997      13611 |
 +------------------------------------------+

To format elapdate, there are several options, but one possibility is

 . format elapdate %d

 . list elapdate

      +-----------+
      |  elapdate |
      |-----------|
   1. | 03dec1999 |
   2. | 08feb1998 |
   3. | 07apr1997 |
      +-----------+

If you have the same date variable in string (as opposed to numeric) format, you can perform the same separation into three variables of month, day, and year using a combination of string manipulation functions. For example, let us assume you have the same three dates as in datevar except they are in a string.

 . list
 
      +----------+
      | str_date |
      |----------|
   1. | 12031999 |
   2. |  2081998 |
   3. |  4071997 |
      +----------+

One solution would be to extract the month, day, and year information from the string positions using the substr() function. You can also convert the string into a numeric format at the same time using the real() function. Because you have dates with 7 characters and 8 characters, you will need to use separate commands to obtain the results as follows:

 .  gen month = real(substr(str_date,1,2)) if strlen(str_date)==8
 (2 missing values generated)
 
 . replace month = real(substr(str_date,1,1)) if strlen(str_date)==7
 (2 real changes made)
 
 . gen date = real(substr(str_date,3,2)) if strlen(str_date)==8
 (2 missing values generated)
 
 . replace date = real(substr(str_date,2,2)) if strlen(str_date)==7
 (2 real changes made)
 
 . gen year = real(substr(str_date,5,4)) if strlen(str_date)==8
 (2 missing values generated)
 
 . replace year = real(substr(str_date,4,4)) if strlen(str_date)==7
 (2 real changes made)
 
 . list
 
      +--------------------------------+
      | str_date   month   date   year |
      |--------------------------------|
   1. | 12031999      12      3   1999 |
   2. |  2081998       2      8   1998 |
   3. |  4071997       4      7   1997 |
      +--------------------------------+
 
 . describe
 
 Contains data from datecalstr.dta
   obs:             3                          
  vars:             4                          14 Apr 2005 12:09
  size:            72 (99.9% of memory free)
 ------------------------------------------------------------------------
               storage  display     value
 variable name   type   format      label      variable label
 ------------------------------------------------------------------------
 str_date        str8   %9s                    
 month           float  %9.0g                  
 date            float  %9.0g                  
 year            float  %9.0g                  
 ------------------------------------------------------------------------
 Sorted by:  
      Note:  dataset has changed since last saved

Another solution is to use the todate command written by Nicholas J. Cox. To obtain todate, type

 . ssc install todate
The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube