Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to aggregate daily data in Stata

From   Nick Cox <>
Subject   Re: st: How to aggregate daily data in Stata
Date   Mon, 17 Dec 2012 08:03:56 +0000

Your problems are in understanding how Stata treats dates. It is all
explained at -help dates and times-.

14000 is a number. If you tell Stata it is a daily date, it believes you.

. di %td 14000

If you tell Stata it is a monthly date, it believes you.

. di %tm 14000

The difference between your two statements is _not_ that you converted
your dates to monthly dates; it is that you told Stata two different
things, and Stata believed you in each case. This mistake is so common
that I have a Tip in Stata Journal 12(4) underlining it.

To convert daily dates to monthly dates, you need a conversion, not a
change of format.

Also -date()- produces _daily_ dates, and never any other kind of
dates. (I've suggested the name -daily()- to StataCorp  to make this
more obvious, although the documentation doesn't imply otherwise.)

If I understand you correctly, you need (e.g.)

 gen dm = mofd(date("1" + month, "DMY"))


On Mon, Dec 17, 2012 at 7:28 AM, Christopher A. Hartwell
<> wrote:
> Thanks Nick, that looks like it will be useful - it helped on my constituent
> data sheets in collapsing the data, so many thanks for your assistance.
> This unfortunately leads to a second question regarding Stata's use of dates
> and how it imports dates from Excel - I'm encountering a problem getting the
> data in correctly into Stata.
> As before, I have a large unbalanced panel dataset of monthly data, with not
> every year/month represented (i.e. Belarus starts at May 1998, Czech
> Republic starts in May 1994, etc.). Using Stata 11IC and stat-transferring
> from Excel, I get the problem of the monthly date coming in as a string
> variable or as a "long" d/m/y variable. The problems, in their order:
> 1.      When originally imported via stat-transfer, the date comes in
> appearing normal (it shows in data editor as a d/m/y variable, the first of
> each month, although, not the just month-year that I want). However, when I
> tsset it as a monthly variable, it appears to have Islamic timing. For
> example, Belarus from May 1998 to January 1999 translates as:
> Country Month
> Belarus 3126m9
> Belarus 3129m4
> Belarus 3131m10
> Belarus 3134m5
> Belarus 3136m12
> Belarus 3139m6
> Belarus 3142m1
> Belarus 3144m7
> Belarus 3147m2
> If I keep it as "daily" data, I can't use any operators such as lags or
> differences, because stata thinks the previous period is missing.
> 2.      Alternately, if I attempt to just paste the data into the data
> editor, the date shows as a string variable. I try . generate
> Date=date(month,"MY") on the month variable, and it generates all missing.
> The same thing happens if I
> . generate Date=date(month,"DMY")
> . gen dm = mofd(Date)
> All are blank.
> Finally, if I encode month, gen(Month) off of this, I end up with a series
> that LOOKS correct - May-98, Jun-98, but when I click on the cell in data
> editor, it shows as a number that is non-sequential. That is, May 98 is 214,
> June-98 is 166, Jul-98 is 142... and if I tsset on this variable, it
> a) says it has tsset "Month, 1960m2 to 1983m12, but with gaps"
> b) sorts the months all together and then the years, so it goes Apr-00,
> Apr-01, Apr-02... then Aug-00, Aug-01... for each country
> Can anyone help with this? Is this an excel problem or a stata one? I just
> need to get the monthly/yearly data imported correctly, or else I can't
> really do anything.
> [] On Behalf Of Nick Cox
> Sent: Sunday, December 16, 2012 2:15 AM
> To:
> Subject: Re: st: How to aggregate daily data in Stata
> . help collapse
> Nick
> On 15 Dec 2012, at 20:46, "Christopher A. Hartwell"
> <  > wrote:
>> Have a question that should be easy to figure out for the experts, but
>> I can't wrap my mind around it conceptually.
>> I have daily stock market data for 28 countries over a 20-odd year
>> period.
>> Some countries have 6 years of data, some have 20 years, etc.,
>> depending on when they started their stock markets. I want to
>> calculate a volatility metric by using the sum of squares of daily
>> changes for each month (that is, aggregate the daily squared changes
>> for each month). Given that the panel data has different time lengths
>> and there are different monthly periods (i.e. some months have 30
>> days, some countries have holidays in the month so there's no data),
>> how can I use stata to sum the squares of the daily changes by month?
>> For example, I have
>> Belarus        March-1-05    1.37
>> Belarus        March-2-05    0.69
>> .
>> .
>> .
>> Belarus        March 31-05    17.33
>> And I want to generate one variable per country per month that is the
>> sum of these numbers, so that I have
>> Belarus        Mar-05        37.20
>> Belarus        Apr-05        18.99
>> Etc.
>> Is there an easy "by x:" command that will let me do it? Or some
>> iteration that will allow for this?
>> The problem I see is that I need to sum the data first by day over a
>> specific month, for each month, for each country, on a dataset that is
>> severely unbalanced.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index