Home  /  Resources & support  /  FAQs  /  Handling date information when numbers are in a continuous sequence (Stata 9)

How do I convert date variables into Stata elapsed dates in Stata 9 or earlier when the numbers run together, like 4151999?

Title   Handling date information in Stata 9 or earlier when numbers are in a continuous sequence
Author David Reichel, StataCorp

The date() function can convert virtually any date format into elapsed dates, which is the format Stata uses to manipulate date information. Elapsed dates are calculated as the number of days from January 1, 1960. This format is useful for adding or subtracting dates and changing the format of date variables.

When month, day, and year are in the form of a continuous string of numbers, such as 11021999, with no spaces, hyphens, or slash marks to separate the numbers, Stata 9 or earlier is not able to read the date information directly.

If you have a numeric date variable in this format, the int() function can be used to separate the date information into three separate variables that can then be used in the mdy() function to read the data. The following example demonstrates how you may perform such a conversion when you have a four-digit year.

Suppose you have a dataset named datecal that has a variable named datevar:

 . list

      +----------+
      |  datevar |
      |----------|
   1. | 12031999 |
   2. |  2081998 |
   3. |  4071997 |
      +----------+

Your program could be

 use datecal, clear
 generate month = int(datevar/1000000)
 generate day = int((datevar - month*1000000)/10000)
 generate year = datevar - month*1000000 - day*10000
 generate elapdate = mdy(month, day, year)
 list

The output from this program is

 +------------------------------------------+
 |  datevar   month   day   year   elapdate |
 |------------------------------------------|
 | 12031999      12     3   1999      14581 |
 |  2081998       2     8   1998      13918 |
 |  4071997       4     7   1997      13611 |
 +------------------------------------------+

To format elapdate, there are several options, but one possibility is

 . format elapdate %d

 . list elapdate

      +-----------+
      |  elapdate |
      |-----------|
   1. | 03dec1999 |
   2. | 08feb1998 |
   3. | 07apr1997 |
      +-----------+

If you have the same date variable in string (as opposed to numeric) format, you can perform the same separation into three variables of month, day, and year using a combination of string manipulation functions. For example, let us assume you have the same three dates as in datevar except they are in a string.

 . list

      +----------+
      | str_date |
      |----------|
   1. | 12031999 |
   2. |  2081998 |
   3. |  4071997 |
      +----------+

One solution would be to extract the month, day, and year information from the string positions using the substr() function. You can also convert the string into a numeric format at the same time using the real() function. Because you have dates with 7 characters and 8 characters, you will need to use separate commands to obtain the results as follows:

 .  gen month = real(substr(str_date,1,2)) if strlen(str_date)==8
 (2 missing values generated)

 . replace month = real(substr(str_date,1,1)) if strlen(str_date)==7
 (2 real changes made)

 . gen date = real(substr(str_date,3,2)) if strlen(str_date)==8
 (2 missing values generated)

 . replace date = real(substr(str_date,2,2)) if strlen(str_date)==7
 (2 real changes made)

 . gen year = real(substr(str_date,5,4)) if strlen(str_date)==8
 (2 missing values generated)

 . replace year = real(substr(str_date,4,4)) if strlen(str_date)==7
 (2 real changes made)

 . list

      +--------------------------------+
      | str_date   month   date   year |
      |--------------------------------|
   1. | 12031999      12      3   1999 |
   2. |  2081998       2      8   1998 |
   3. |  4071997       4      7   1997 |
      +--------------------------------+

 . describe

 Contains data from datecalstr.dta
   obs:             3
  vars:             4                          14 Apr 2005 12:09
  size:            72 (99.9% of memory free)
 ------------------------------------------------------------------------
               storage  display     value
 variable name   type   format      label      variable label
 ------------------------------------------------------------------------
 str_date        str8   %9s
 month           float  %9.0g
 date            float  %9.0g
 year            float  %9.0g
 ------------------------------------------------------------------------
 Sorted by:
      Note:  dataset has changed since last saved

Another solution is to use the todate command written by Nicholas J. Cox. To obtain todate, type

 . ssc install todate