Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to aggregate daily data in Stata
Nick Cox <email@example.com>
Re: st: How to aggregate daily data in Stata
Mon, 17 Dec 2012 08:03:56 +0000
Your problems are in understanding how Stata treats dates. It is all
explained at -help dates and times-.
14000 is a number. If you tell Stata it is a daily date, it believes you.
. di %td 14000
If you tell Stata it is a monthly date, it believes you.
. di %tm 14000
The difference between your two statements is _not_ that you converted
your dates to monthly dates; it is that you told Stata two different
things, and Stata believed you in each case. This mistake is so common
that I have a Tip in Stata Journal 12(4) underlining it.
To convert daily dates to monthly dates, you need a conversion, not a
change of format.
Also -date()- produces _daily_ dates, and never any other kind of
dates. (I've suggested the name -daily()- to StataCorp to make this
more obvious, although the documentation doesn't imply otherwise.)
If I understand you correctly, you need (e.g.)
gen dm = mofd(date("1" + month, "DMY"))
On Mon, Dec 17, 2012 at 7:28 AM, Christopher A. Hartwell
> Thanks Nick, that looks like it will be useful - it helped on my constituent
> data sheets in collapsing the data, so many thanks for your assistance.
> This unfortunately leads to a second question regarding Stata's use of dates
> and how it imports dates from Excel - I'm encountering a problem getting the
> data in correctly into Stata.
> As before, I have a large unbalanced panel dataset of monthly data, with not
> every year/month represented (i.e. Belarus starts at May 1998, Czech
> Republic starts in May 1994, etc.). Using Stata 11IC and stat-transferring
> from Excel, I get the problem of the monthly date coming in as a string
> variable or as a "long" d/m/y variable. The problems, in their order:
> 1. When originally imported via stat-transfer, the date comes in
> appearing normal (it shows in data editor as a d/m/y variable, the first of
> each month, although, not the just month-year that I want). However, when I
> tsset it as a monthly variable, it appears to have Islamic timing. For
> example, Belarus from May 1998 to January 1999 translates as:
> Country Month
> Belarus 3126m9
> Belarus 3129m4
> Belarus 3131m10
> Belarus 3134m5
> Belarus 3136m12
> Belarus 3139m6
> Belarus 3142m1
> Belarus 3144m7
> Belarus 3147m2
> If I keep it as "daily" data, I can't use any operators such as lags or
> differences, because stata thinks the previous period is missing.
> 2. Alternately, if I attempt to just paste the data into the data
> editor, the date shows as a string variable. I try . generate
> Date=date(month,"MY") on the month variable, and it generates all missing.
> The same thing happens if I
> . generate Date=date(month,"DMY")
> . gen dm = mofd(Date)
> All are blank.
> Finally, if I encode month, gen(Month) off of this, I end up with a series
> that LOOKS correct - May-98, Jun-98, but when I click on the cell in data
> editor, it shows as a number that is non-sequential. That is, May 98 is 214,
> June-98 is 166, Jul-98 is 142... and if I tsset on this variable, it
> a) says it has tsset "Month, 1960m2 to 1983m12, but with gaps"
> b) sorts the months all together and then the years, so it goes Apr-00,
> Apr-01, Apr-02... then Aug-00, Aug-01... for each country
> Can anyone help with this? Is this an excel problem or a stata one? I just
> need to get the monthly/yearly data imported correctly, or else I can't
> really do anything.
> [mailto:firstname.lastname@example.org] On Behalf Of Nick Cox
> Sent: Sunday, December 16, 2012 2:15 AM
> To: email@example.com
> Subject: Re: st: How to aggregate daily data in Stata
> . help collapse
> On 15 Dec 2012, at 20:46, "Christopher A. Hartwell"
> <firstname.lastname@example.org > wrote:
>> Have a question that should be easy to figure out for the experts, but
>> I can't wrap my mind around it conceptually.
>> I have daily stock market data for 28 countries over a 20-odd year
>> Some countries have 6 years of data, some have 20 years, etc.,
>> depending on when they started their stock markets. I want to
>> calculate a volatility metric by using the sum of squares of daily
>> changes for each month (that is, aggregate the daily squared changes
>> for each month). Given that the panel data has different time lengths
>> and there are different monthly periods (i.e. some months have 30
>> days, some countries have holidays in the month so there's no data),
>> how can I use stata to sum the squares of the daily changes by month?
>> For example, I have
>> Belarus March-1-05 1.37
>> Belarus March-2-05 0.69
>> Belarus March 31-05 17.33
>> And I want to generate one variable per country per month that is the
>> sum of these numbers, so that I have
>> Belarus Mar-05 37.20
>> Belarus Apr-05 18.99
>> Is there an easy "by x:" command that will let me do it? Or some
>> iteration that will allow for this?
>> The problem I see is that I need to sum the data first by day over a
>> specific month, for each month, for each country, on a dataset that is
>> severely unbalanced.
* For searches and help try: