Thanks a lot for all the suggestions and remarks.
A short note on the context of the analysis with regards to Methods (A) and
type of data (B) (based on Nick's comment that "some nonparametric
regression might do more justice to the data"):
(A) Description of methods:
The idea is to carry out the analysis step by step using different methods
to analyze if results are robust between the different methods;
(1) The decile sorts are used for a first assessment of the data, and I
think they should also be pretty robust against outliers and extreme values.
(2) Afterwards the data is analysed with the method of "cross-sectional
regressions" introduced by Fama and MacBeth 1973 (Journal of Finance 81,
"Risk, return and equilibrium: Empirical tests"). For a discussion of the
method see also statalist http://www.stata.com/statalist/archive/2004-10/ or
http://www.antonisureda.com/blog/blog.html for ado files implementing the
method.
(3) Finally, I plan to carry out a panel regression.
(B) Description of data:
As already mentioned, the sample consists of financial data on 300
companies. The figures are based on Annual Reports data and explanatory
variables are constructed as yearly changes of ratios, e.g. "Equity to Total
Assets", "Overhead to Total Operating Income" and various others. The
dependent variable is "Total Return to Shareholders" calculated from a
return index for each individual stock. As Annual Reports of year t are
published on average in February/March of year t+1, the accounting data is
matched with the return data of February/March/April/May (same accounting
data for each month).
Any further comments and thoughts on the subject are of course appreciated.
- Tom
-----Ursprüngliche Nachricht-----
Von: statalist-owner@hsphsun2.harvard.edu
[mailto:statalist-owner@hsphsun2.harvard.edu] Im Auftrag von Jeph Herrin
Gesendet: Freitag, 10. November 2006 02:01
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: RE: Decile sorts
Hey, I posted my correction the same minute you did...
Nick Cox wrote:
> I guess you're onto something good, except
> that second time around the loop -deciles-
> already exists. So this needs a tweak,
> depending on whether -deciles- is dispensable.
>
> Nick
> n.j.cox@durham.ac.uk
>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Jeph Herrin
>> Sent: 09 November 2006 23:46
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: RE: Decile sorts
>>
>>
>> Maybe I'm missing something, but why not:
>>
>> foreach X of varlist c1* {
>> xtile deciles=`X', n(10)
>> bys deciles: egen R`X'=mean(`X')
>> }
>>
>> ?
>>
>> hth,
>> Jeph
>>
>>
>> Nick Cox wrote:
>>> Various comments sprinkled here and there. You may have
>>> strong reasons to use these decile bins, but binning
>>> strikes me as, usually, at best a means towards an end
>>> (or perhaps ends towards some means). Some nonparametric
>>> regression might do more justice to the data.
>>>
>>> Also, you are mixing two naming conventions 1...10
>>> and 10...90. Just use one.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> Thomas Erdmann
>>>
>>>> I am trying to sort my observations into deciles according to
>>>> one attribute
>>>> and afterwards calculating the average of another attribute
>>>> of those ten groups.
>>>
>>>> Please find the code I came up with below [lines with ... are
>>>> omitted], yrm is the time variable (YearMonth)
>>>>
>>>> (1) As far as I can tell it works out, but a) it's a lot
>> of code and
>>>> b)produces a lot of variables and c)generating the output is
>>>> rather awkward.
>>>>
>>>> Could you give me hints on how to implement a smarter
>>>> solution or if there
>>>> are any errors in the way the calculation is carried out currently?
>>>
>>>> *** Generate Percentiles
>>>> sort yrm
>>>> foreach X of varlist c1* {
>>>> by yrm: egen p10_`X'= pctile(`X'), p(10.0)
>>>> by yrm: egen p20_`X'= pctile(`X'), p(20.0)
>>>> by yrm: egen p30_`X'= pctile(`X'), p(30.0)
>>>> ...
>>>> by yrm: egen p90_`X'= pctile(`X'), p(90.0)
>>>> }
>>> This is two loops rolled out into one.
>>>
>>> sort yrm
>>> foreach X of varlist c1* {
>>> forval i = 10(10)90 {
>>> by yrm : egen p`i'_`X' = pctile(`X'), p(`i')
>>> }
>>> }
>>>
>>>
>>>> *** Sort into Percentile groups
>>>> foreach X of varlist c1* {
>>>> gen G_`X'=1 if `X'<p10_`X' & `X'~=.
>>>> replace G_`X'=2 if `X'>p10_`X' & `X'<p20_`X'
>>>> ...
>>>> replace G_`X'=9 if `X'>p80_`X' & `X'<p90_`X'
>>>> replace G_`X'=10 if `X'>p90_`X' & `X'~=.
>>>> }
>>> Similar story with boundary conditions.
>>>
>>> foreach X of varlist c1* {
>>> gen byte G_`X' = `X' < p10_`X'
>>>
>>> forval i = 2/9 {
>>> local j = 10 * `i'
>>> replace G_`X' = `i' if `X' < p`j'_`X' &
>> G_`X' == 0
>>> }
>>>
>>> replace G_`X' = cond(`X' == ., ., 10) if G_`X' == 0
>>> }
>>>
>>>
>>>> *** Calculate return mean for each group
>>>> sort yrm
>>>> foreach X of varlist G* {
>>>> by yrm: egen R1`X'= mean(c1ds_ri) if `X'==1
>>>> by yrm: egen R2`X'= mean(c1ds_ri) if `X'==2
>>>> ...
>>>> by yrm: egen R9`X'= mean(c1ds_ri) if `X'==9
>>>> by yrm: egen R10`X'= mean(c1ds_ri) if `X'==10
>>>> }
>>> Why do you need all these variables? The results
>>> for bin are disjoint, so can be put in a single
>>> variable.
>>>
>>> foreach X of varlist G* {
>>> bysort yrm `X' : egen R`X' = mean(c1ds_ri)
>>> }
>>>
>>> Having said that, it can probably done more
>>> directly with a series of -collapse-s.
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/support/faqs/res/findit.html
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/support/faqs/res/findit.html
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.0/524 - Release Date: 08.11.2006
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/