Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: Calculate variances of subsamples

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	RE: st: RE: Calculate variances of subsamples
Date	Sat, 5 Jun 2010 22:50:57 +0200
<>


So, Lars, this is getting quite challenging, not least because we do not
have your data. The code you posted seems to assume the presence of your
data (what is "pricemsci", for instance?) The -reshape-ing we did a couple
of hours ago now increasingly is looking like a bit of a red herring, and it
certainly does not help on your way to a solution for your problem. 

Here is code that builds on our state of affairs before we -reshape-d. Note
I am creating a fake return in there, so the variances calculated earlier
have no connection to it. Everything else would require substantial
restructuring of the solution. The code generates three portfolios that
contain "low", "middle" and "high" variance stocks:



***********
//create resultsfile
cap erase myfile.dta
di in red _rc

clear*
gen start=.
gen end=.
gen _stat_1=.
gen stock=.
gen str15 kindofreturn=""

save myfile, replace

//get "3105.dta"
clear*
//8 stocks
set obs 8
gen byte stock=_n

//5 time periods
expand 5
bys stock: gen byte time=_n

gen double exret=rnormal()
gen double msciret=rnormal()
gen double msftret=rnormal()
gen double appret=rnormal()
gen double geret=rnormal()
gen double pgret=rnormal()
gen double jnjret=rnormal()
gen double bpret=rnormal()

save 3105, replace

//-use- "3105"
u 3105, clear

//Return calculation
gen double grexret=ex[_n]/ex[_n-1]-1 if _n>1
gen double grmsciret=msci[_n]/msci[_n-1]-1 if _n>1
gen double grmsftret=msft[_n]/msft[_n-1]-1 if _n>1
gen double grappret=app[_n]/app[_n-1]-1 if _n>1
gen double grgeret=ge[_n]/ge[_n-1]-1 if _n>1
gen double grpgret=pg[_n]/pg[_n-1]-1 if _n>1
gen double grjnjret=jnj[_n]/jnj[_n-1]-1 if _n>1
gen double grbpret=bp[_n]/bp[_n-1]-1 if _n>1

//loop to get -rolling- results for each stock
//and each return

foreach ret in exret msciret msftret{

//start inner loop
su stock, mean  
qui forv i=1/`r(max)'{
        preserve
        keep if stock==`i'
        tsset time
        rolling r(Var), window(2) clear: su `ret'
        gen stock=`i'
        gen kindofreturn="`ret'"
        append using myfile
        save myfile, replace
        restore
}

//end inner loop

}

u myfile, clear
ren _stat_1 Variance
sort stock kindofreturn start

la def sto 1 "Firm 1" 2 "Firm 2" 3 "Firm 3" /// 
4 "Firm 4" 5 "Firm 5" 6 "Firm 6" 7 "Firm 7"  /// 
8 "Firm 8" 
la val stock sto

//get "low"/"middle"/"high" volatility portfolios
bys start kindofreturn (Variance): gen byte lowvar=_n<=3
bys start kindofreturn (Variance): gen byte middlevar=inlist(_n,4,5)
bys start kindofreturn (Variance): gen byte highvar=_n>5
l, h(30) noo sepby(start kindofreturn lowvar middlevar highvar)

//generate very fake return
gen myreturn=rnormal(.1,.05)

bys start kindofreturn (lowvar):  /// 
egen averagereturnlow=mean(myreturn) if lowvar
bys start kindofreturn (middlevar):  /// 
egen averagereturnmiddle=mean(myreturn) if middlevar
bys start kindofreturn (highvar): /// 
 egen averagereturnhigh=mean(myreturn) if highvar
sort start kindofreturn lowvar middlevar highvar
l start Variance myreturn stock lowvar averagereturn*, /// 
 noo sepby(start kindofreturn low middle high)
***********


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Lars Knuth
Sent: Samstag, 5. Juni 2010 21:16
To: [email protected]
Subject: Re: st: RE: Calculate variances of subsamples

Dear Statalisters,

I want to share the results, maybe there is someone (in the future),
who has the same problem to solve. The input came almost completely
from Martin.
The program takes price data for a large number of stocks, calculates
returns, calculates then variances for every stock individually many
times using a rolling window. The variances are in the right order to
be compared in the cross-section.

This is also the beginning of my next and last problem:
I have all the variances. I now need to compare at every point in time
all the variances for the different stocks (Exxon, Microsoft etc),
rank them with the lowest variance first (at every point in time),
then build 10 portfolios (it will be more than 1000 stocks) where the
first portfolio for example exists of the stocks with the 10% lowest
variances, the second includes the next ten percent etc.

For those portfolios, the return has to be calculated (so the datasets
have to be merged again). Then it can be tested (t-test) whether the
low variances stock portfolio has a statistically significantly lower
return than the high variance portfolio.

As I said, if someone (Martin?) has an idea for that, I would be more
than thankful since in this case I can finish my Stata work for the
moment.

Thanks in advance!

clear*
set more off

*create resultsfile
cap erase resultsfile.dta
di in red _rc

clear*
gen start=.
gen end=.
gen _stat_1=.
gen str15 kindofreturn=""

save resultsfile, replace

use "\3105.dta", clear
gen int time=_n

*Return calculation
foreach price in pricemsci priceex pricemsft priceapp pricege pricepg
pricejnj pricebp {
gen double `price'ret=`price'[_n]/`price'[_n-1]-1 if _n>1
}
renpfix price

*loop to get -rolling- results for each stock
foreach ret in exret msciret msftret appret geret pgret jnjret bpret{
       preserve
       tsset time
       rolling r(Var), window(60) clear: su `ret'
       gen kindofreturn="`ret'"
       append using resultsfile
       save resultsfile, replace
       restore
}

u resultsfile, clear
ren _stat_1 Variance
sort kindofreturn start
l, sepby(kindofreturn) noo

bys kindofreturn: gen int time=_n
reshape wide Variance, i(time) j(kindofreturn) string
renpfix Variance
list, noo





2010/6/5 Martin Weiss <[email protected]>:
>
> <>
>
> ***********
> clear*
>
> input Var str16 stock
> 0.00234    exxon
> .05654    exxon
> 0.13444    exxon
> 0.99388    microsoft
> .4342     microsoft
> 0.42445    microsoft
> 0.42444    intel
> 0.32443       intel
> 0.23434     intel
> end
>
> bys stock: gen int time=_n
> reshape wide Var, i(time) j(stock) string
> renpfix Var
> list, noo
> ***********
>
>
> HTH
> Martin
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Lars Knuth
> Sent: Samstag, 5. Juni 2010 17:49
> To: [email protected]
> Subject: Re: st: RE: Calculate variances of subsamples
>
> Oh, my explanation was probably irritating. It was just for
> illustration. I have two columns, one with the numbers, the other
> having the strings. What I need to have in a new file is just the
> numbers.
>
> 0.00234(exxon) says that there should be the 0.00234, which depends to
> exxon etc. It were of course nice if the new variable including the
> exxon numbers would be named exxon, the one having the numbers for
> microsoft could be named microsoft etc. However, the important part
> concerns just the numbers.
>
> 2010/6/5 Martin Weiss <[email protected]>:
>>
>> <>
>>
>> Your problem may turn out to be easily solved with a -reshape-. It is not
> a
>> good idea, though, to have "0.00234(exxon)" in a cell of your data, as
> this
>> would have to be stored as a string, precluding any further processing of
>> the number. Did you write it as an illustration, or do you really want
the
>> cell to contain the string?
>>
>>
>> HTH
>> Martin
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Lars Knuth
>> Sent: Samstag, 5. Juni 2010 17:28
>> To: [email protected]
>> Subject: Re: st: RE: Calculate variances of subsamples
>>
>> Ok, great, it took some time, but I finally understood Martin`s
>> code... this is a great way of learning more about STATA.
>> My next problem is that I have a variable with the variances for 536
>> -rolling- steps for each of the stocks.
>> It looks like this:
>> Variance         stock name
>> 0.00234          exxon
>> ...........          exxon
>> 0.13444          exxon
>> 0.99388          microsoft
>> ...........           microsoft
>> 0.42445          microsoft
>> 0.42444          intel
>> ........              intel
>> 0.23434           intel
>>
>> What I would like to have is the following:
>>
>> 0.00234(exxon)          0.99388(microsoft)     0.42444(intel)
>> ...........                    ............
> ............
>> 0.13444                    0.42445                     0.23434(intel)
>>
>> I could do gen varexxon=Variance if stockname=="exxon"
>> and that for all the stocks. But even if I do so I get variables with
>> a lot of missings and I can not write the variances horizontally next
>> to each other.
>>
>> But they are from the same time (because of -rolling-) and I need them
>> to be horizontally ordered without the missings.
>>
>> I hope my problem becomes clear. I guess what I miss is just a small
>> command.
>> Thank you in advance for any hint!
>>
>> 2010/6/2 Martin Weiss <[email protected]>:
>>>
>>> <>
>>>
>>> You could of course issue the -rolling- call with -clear- present,
-save-
>>> the result to a new file and reload your "3105.dta" to start anew for
the
>>> next stock. The datasets thus -saved- could be -append-ed to form one
big
>>> dataset afterwards. -postfile- is also an option, as always.
>>>
>>> BTW, you may be better of with the lag operator "L." for your return
>>> calculations.
>>>
>>>
>>> HTH
>>> Martin
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Lars Knuth
>>> Sent: Mittwoch, 2. Juni 2010 20:22
>>> To: statalist
>>> Subject: st: Calculate variances of subsamples
>>>
>>> Dear listers,
>>>
>>> I have to say thanks to Martin, the recommendation of rolling was
>>> great. Unfortunately, I have now a few problems with the
>>> implementation.
>>> 1. -rolling- works with the "clear" option, but without it does not
>>> ("rolling r(Var), window(60) clear: summarize exret" works)
>>> 2. I need the data to calculate and store the variances for more than
>>> 1000 stock price returns in the end, so can I somehow keep all the
>>> data and then perform -rolling- in a loop?
>>> 3. Is there also an opportunity to perform the return calculation in a
>> loop?
>>>
>>> I am attaching parts of the code I have so far. Any ideas would be of
>>> great help to me.
>>> Thanks in advance!
>>>
>>> clear*
>>> use "C:\...\3105.dta", clear
>>>
>>> gen int time=_n
>>> * Return calculation
>>> gen double exret=ex[_n]/ex[_n-1]-1 if _n>1
>>> gen double msciret=msci[_n]/msci[_n-1]-1 if _n>1
>>> gen double msftret=msft[_n]/msft[_n-1]-1 if _n>1
>>> gen double appret=app[_n]/app[_n-1]-1 if _n>1
>>> gen double geret=ge[_n]/ge[_n-1]-1 if _n>1
>>> gen double pgret=pg[_n]/pg[_n-1]-1 if _n>1
>>> gen double jnjret=jnj[_n]/jnj[_n-1]-1 if _n>1
>>> gen double bpret=bp[_n]/bp[_n-1]-1 if _n>1
>>>
>>> tsset time
>>>
>>> * Rolling
>>> rolling r(Var), window(60): summarize exret
>>> rolling r(Var), window(60): summarize msciret
>>> rolling r(Var), window(60): summarize msftret
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: Calculate variances of subsamples
  - From: Lars Knuth <[email protected]>
- Re: st: RE: Calculate variances of subsamples
  - From: Lars Knuth <[email protected]>
- Re: st: RE: Calculate variances of subsamples
  - From: Lars Knuth <[email protected]>
- Re: st: RE: Calculate variances of subsamples
  - From: Lars Knuth <[email protected]>
Prev by Date: RE: st: RE: Quantile regression runtimes
Next by Date: st: xsampsi produces inconsistent results
Previous by thread: Re: st: RE: Calculate variances of subsamples
Next by thread: st: Combining Frequency Tables
Index(es):
- Date
- Thread