Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: st: Re: st: Σχετ: st: calculating percentage changes in an unbalanced panel data set


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: st: Re: st: Σχετ: st: calculating percentage changes in an unbalanced panel data set
Date   Wed, 6 Mar 2013 11:43:14 +0000

A more general point is embedded here which arises again and again
with date variables.

It is vital to realise that descriptions of the form

my dates are of format dd/mm/yyyy

do not distinguish between string variables with values such as
"25/12/2012" and numeric date variables with -format- (Stata's sense)
%tdd/n/Cy

That is,

1. The word "format" is overloaded.

2. The needed information is about _types_.

The output of -describe- for the variable concerned is informative.
Word descriptions using your own terminology are often ambiguous.

On Wed, Mar 6, 2013 at 10:38 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> My guess is that -time- is a string variable.
>
> This contradicts the earlier output from -tsset-, which would not have
> worked if -time- were a string variable.
>
> So, we need another guess. Perhaps you are really using different
> names, but translated for Statalist for some reason, but forgot to
> take some difference into account. Or this is a slightly different
> version of the same data. Either way, there is something about your
> dataset which you are not telling us.
>
> Whatever the answer, -mofd()- should work if and only if the argument
> is a numeric daily date variable. If it's a string variable, you
> should use -date()- to convert it to a numeric daily date variable.
>
> Nick
>
> On Wed, Mar 6, 2013 at 9:08 AM, Tzaloupas Dimitrov
> <tzaloupas1232@yahoo.gr> wrote:
>
>> thanks for your reply Rebecca. the dates that I have in my files are of this format dd/mm/yyyy. so by applying the code you provided, specifically
>>
>> gen month=mofd(time)  I get the following error
>>
>> type mismatch
>> r(109);
>>
>>
>> So, still I can not find the answer to my question. Is there any other suggestion?
>
> Rebecca Pope <rebecca.a.pope@gmail.com>
>
>> You have inflation measured on a daily basis? My guess is not. In all
>> likelihood, what you have is monthly data that happens to be coded
>> 01mmmYYYY. Stata, however, does not know this.
>>
>> gen month = mofd(time)  // get date in month format
>> format month %tm
>> tsset id month
>>
>> Now Stata knows you have monthly changes, so it doesn't appear that
>> you have many missing observations within your panel simply due to
>> false "gaps" because of how your data is recorded.
>>
>> Once you have -tsset- your data, you can use the lag operator.
>> Otherwise, based on what you are doing, there isn't much point in
>> -tsset-. Using lags to calculate a change in the inflation rate would
>> be as so:
>>
>> gen p2 = (inf/L.inf-1)*100  // L. is Stata's lag operator (see -help
>> tsvarlist- if unfamiliar)
>>
>> If you are wanting inflation since "baseline" rather than
>> period-to-period inflation:
>> bys id (mon): gen p2_alt = (inf[_n]/inf[1]-1)*100  // note here that
>> the time variable is in ()
>>
>> In your original code, you had "bys country time". The problem with
>> this is that Stata is looking within country _and_ time and counting
>> observations. Because you only have one observation at each time
>> period, you get missing values. Placing time in parentheses tells
>> Stata to sort by that value but not to count within it.
>>
>> p2 will result in missing values if your panel data are still
>> unbalanced after correcting for monthly observations. p2_alt will give
>> you a value at every point in your series. However, the two provide
>> fundamentally different information. Your example of 2/1 leaves the
>> ultimate question unclear so I've given you code for both.
>
> On Tue, Mar 5, 2013 at 5:27 PM, Tzaloupas Dimitrov
>
>>> I have some time series observations (inflation) for a set of countries
>>>
>>> The panel data set is unbalanced, that is,
>>>
>>> egen id = group(country), label
>>> tsset id time
>>>        panel variable:  id (unbalanced)
>>>        time variable:  time, 01oct2008 to 01nov2011, but with gaps
>>>                delta:  1 day
>>>
>>> within each country I want to find the percentage change of inflation.
>>>
>>> I tried
>>>
>>> bysort country time : gen p2=(inf[2]-inf[1]/inf[1])*100
>>>
>>> but I get this message
>>> (500 missing values generated)
>>>
>>>  Am I doing something wrong?
>>>
>>>
>>>
>>> I use Stata 11
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index