Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Re: st: RE: generating annualized standard deviation of returns from monthly data.

 From Carlos Avellaneda Suárez To statalist@hsphsun2.harvard.edu Subject Re: st: RE: generating annualized standard deviation of returns from monthly data. Date Thu, 27 Feb 2014 12:06:21 -0500

```The problem is that your "year" variable is not actually a year
variable, but day variable, and each combination of firm - "year"
always represents a unique observation in your dataset, to which it is
impossible to calculate a standard deviation. You have to create a
real year variable in order to obtain what you want.

2014-02-27 11:56 GMT-05:00 Ikechukwu M. <bigdoctor2004@gmail.com>:
> Thank you.
>
> here is what I get when I perform either of the two commands.
>
> I agree that without the year grouping variable there should be one sd
> returned per firm. It is including the year grouping variable that
> messes things up.
>
>
>         year              tic            return                sd_return
> 78. 31jan2000      0183B   -10.71428571          .
> 79.  29feb2000      0183B             48          .
> 80.  31mar2000      0183B   -29.72972973          .
> ------------------------------------------------
> 81.  30apr2000      0183B    7.692307692          .
> 82.  31may2000      0183B   -17.85714286          .
> 83.  30jun2000      0183B    39.13043478          .
> 84.  31jul2000      0183B         -18.75          .
> 85.  31aug2000      0183B    61.53846154          .
> ------------------------------------------------
> 86.  30sep2000      0183B   -33.33333333          .
> 87.  31oct2000      0183B    14.28571429          .
> 88.  30nov2000      0183B         -18.75          .
> 89.  31dec2000      0183B   -7.692307692          .
> 90.  31jan2001      0183B           37.5          .
> ------------------------------------------------
> 91.  28feb2001      0183B   -27.27272727          .
> 92.  31mar2001      0183B             50          .
> 93.  30apr2001      0183B   -18.22222222          .
> 94.  31may2001      0183B             25          .
> 95.  30jun2001      0183B   -6.086956522          .
> ------------------------------------------------
> 96.  31jul2001      0183B   -20.83333333          .
> 97.  31aug2001      0183B    2.339181287          .
> 98.  30sep2001      0183B   -22.85714286          .
> 99.  31oct2001      0183B    39.25925926          .
> 100.  30nov2001      0183B   -20.21276596          .
> ------------------------------------------------
> 101.  31dec2001      0183B   -.6666666667          .
> 102.  31jan2002      0183B    9.395973154          .
> 103.  28feb2002      0183B              0          .
> 104.  31jan2000      0223B              0          .
> 105.  29feb2000      0223B    5.551515152          .
> ------------------------------------------------
> 106.  31mar2000      0223B    1.447178003          .
> 107.  30apr2000      0223B    .4279600571          .
> 108.  31may2000      0223B              0          .
> 109.  31jan2000      0226B              0          .
> 110.  29feb2000      0226B              0          .
> ------------------------------------------------
> 111.  31mar2000      0226B              0          .
> 112.  30apr2000      0226B              0          .
> 113.  31may2000      0226B            800          .
> 114.  30jun2000      0226B   -33.33333333          .
> 115.  31jul2000      0226B              0          .
> ------------------------------------------------
> 116.  31aug2000      0226B              0          .
>
>
> This result is obtained from bysort firm year: egen SD=sd(return)
>
> Thanks again.
>
> IK
>
> On Thu, Feb 27, 2014 at 10:47 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> If you don't specify the year as a grouping variable, then values for
>> different years are lumped together; that is precisely as it should
>> be.
>>
>> Otherwise, I can't make sense of the claim that you get missing for SD
>> with (e.g.) 6 non-missing values. -collapse- produces a missing SD if
>> all values (or all but one) values are missing in a group, but not
>> otherwise. (The "all but one" follows from the use of (n - 1) rather
>> than n in the formula for SD, n being sample size as usual.)
>>
>> If you were expecting that missing values would be omitted from the
>> -collapse- results, that expectation was incorrect.
>>
>> To make clear your perceived problem, we need to see data and output,
>> e.g. for examples like that below.
>>
>> . clear
>>
>> . input firm year return
>>
>>           firm       year     return
>>   1. 1 2000 0.875
>>   2. 1 2000 1.2
>>   3. 1 2000 0.9
>>   4. 1 2000 0.35
>>   5. 1 2000 0.98
>>   6. 1 2000 1.4
>>   7. 1 2000  .
>>   8. 1 2000  .
>>   9. 1 2000  .
>>  10. 1 2000  .
>>  11. 1 2000  .
>>  12. 1 2000  .
>>  13. 1 2001  .
>>  14. 1 2001  .
>>  15. end
>>
>> . collapse (sd) return, by(firm year)
>>
>> . list
>>
>>      +------------------------+
>>      | firm   year     return |
>>      |------------------------|
>>   1. |    1   2000   .3560957 |
>>   2. |    1   2001          . |
>>      +------------------------+
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 27 February 2014 15:28, Ikechukwu M. <bigdoctor2004@gmail.com> wrote:
>>> Thanks. Apologies for incorrect attribution to Nick Cox. What I meant
>>> to say is that occurrence of missing values collapses to a missing,
>>> even though I expected the missings to be ignored.
>>> Thanks for the input - I have implemented what you both suggest and
>>> the good news is that it resolves to the same thing so it is working
>>> but not producing the desired output. I am ending up with missing
>>> values even for firms that have 6 monthly observations for the year.
>>>
>>> The collapse code I used is this:
>>> collapse (sd) sd_return=return, by(firm year)
>>>
>>> using bysort firm year: egen SD=sd(return)
>>>
>>> but when I omit the year, sd is appropriately computed but for all 10
>>> years of the data, not partitioned into years.
>>>
>>> When I include the year, I end up with lots of missing observations.
>>>
>>> Thanks
>>>
>>> On Thu, Feb 27, 2014 at 4:21 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> There are various "Nick"s around here. In my case, I wouldn't offer
>>>> the explanation that the occurrence of missings will imply zero
>>>> standard deviations with -collapse-, because it isn't true. More
>>>> importantly, as you don't give the -collapse- code you used, we are
>>>> reduced to speculation that somehow your -collapse- produced a
>>>> collapse to constants, which have 0 SD.
>>>> Nick
>>>> njcoxstata@gmail.com
>>>>
>>>>
>>>> On 27 February 2014 05:53, Ikechukwu M. <bigdoctor2004@gmail.com> wrote:
>>>>> Thanks Kieran for your response. I tried that and it gives me all
>>>>> zeros. I think it has to do with how stata treats missing values in
>>>>> the collapse command. I had seen an earlier post by Nick regarding
>>>>> this.
>>>>>
>>>>> I used bys firm : egen sd=sd(return) and I get values but they are not
>>>>> partitioned by year. It gives me one SD for all the datapoints for the
>>>>> firm.
>>>>>
>>>>> thanks
>>>>>
>>>>> On Wed, Feb 26, 2014 at 11:23 PM, Kieran McCaul
>>>>> <kieran.mccaul@uwa.edu.au> wrote:
>>>>>> ...
>>>>>>
>>>>>> Like this?
>>>>>>
>>>>>> clear *
>>>>>>
>>>>>> input firm str7 date return
>>>>>> 1  "Jan2000"  0.875
>>>>>> 1  "Feb2000"  1.2
>>>>>> 1  "Mar2000"  0.9
>>>>>> 1  "Jan2001"  0.35
>>>>>> 1  "Feb2001"  0.98
>>>>>> 2  "Jan2000"  1.4
>>>>>> 2  "Feb2000"   .76
>>>>>> 2  "Mar2000"  1.34
>>>>>> end
>>>>>>
>>>>>> gen year = substr(date, 4,.)
>>>>>>
>>>>>> preserve
>>>>>>
>>>>>>    collapse (sd) sd_return=return, by(firm year)
>>>>>>    tempfile ttt
>>>>>>    save `ttt', replace
>>>>>>
>>>>>> restore
>>>>>>
>>>>>> merge m:1 firm year using `ttt'
>>>>>> list
>>>>>> bysort firm year: summ return
>>
>>>>>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Ikechukwu M.
>>>>>> Sent: Thursday, 27 February 2014 9:33 AM
>>>>>> To: statalist@hsphsun2.harvard.edu
>>>>>> Subject: st: generating annualized standard deviation of returns from monthly data.
>>>>>>
>>>>>> I am trying to compute standard deviation of returns for a panel data set and I am having a little difficulty.
>>>>>>
>>>>>> My data looks like this
>>>>>>
>>>>>> Firm    date                 return
>>>>>> 1         Jan2000              0.875
>>>>>> 1         Feb2000              1.2
>>>>>> 1        Mar2000               0.9
>>>>>> 1        Jan2001               0.35
>>>>>> 1        Feb2001               0.98
>>>>>> 2        Jan2000                1.4
>>>>>> 2        Feb2000                .76
>>>>>> 2        Mar2000                1.34
>>>>>>
>>>>>>
>>>>>> I would like to compute the annualized standard deviation of returns for each firm and return one number for each firm in each year.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index