A further note on Jeph's suggestion:
It looks very convenient, but I need to adjust for the fact that I do not
need the mean of the same item but of a different attribute:
foreach X of varlist c1* {
xtile deciles_`X'=`X', n(10)
bysort deciles_`X': egen Rr`X'=mean(c1ds_ri)
}
But a problem still remains:
the deciles are calculated over all observations - but what I need is
calculating the mean of deciles by yrm (my time variable representing
YearMonth) and afterwards the mean of all deciles groups (1-10) over all
yrm's. I was not able to integrate this into this short solution as -by- is
not allowed for -xtile- .
-Tom
Oops, don't forget to drop -deciles-
foreach X of varlist c1* {
xtile deciles=`X', n(10)
bys deciles: egen R`X'=mean(`X')
drop deciles
}
Jeph Herrin wrote:
> Maybe I'm missing something, but why not:
>
> foreach X of varlist c1* {
> xtile deciles=`X', n(10)
> bys deciles: egen R`X'=mean(`X')
> }
>
> ?
>
> hth,
> Jeph
>
>
Nick Cox wrote:
>> Various comments sprinkled here and there. You may have
>> strong reasons to use these decile bins, but binning strikes me as,
>> usually, at best a means towards an end (or perhaps ends towards some
>> means). Some nonparametric
>> regression might do more justice to the data.
>> Also, you are mixing two naming conventions 1...10 and 10...90. Just
>> use one.
>> Nick n.j.cox@durham.ac.uk
>> Thomas Erdmann
>>
>>> I am trying to sort my observations into deciles according to one
>>> attribute
>>> and afterwards calculating the average of another attribute of those
>>> ten groups.
>>
>>> Please find the code I came up with below [lines with ... are
>>> omitted], yrm is the time variable (YearMonth)
>>>
>>> (1) As far as I can tell it works out, but a) it's a lot of code and
>>> b)produces a lot of variables and c)generating the output is rather
>>> awkward.
>>>
>>> Could you give me hints on how to implement a smarter solution or if
>>> there
>>> are any errors in the way the calculation is carried out currently?
>>
>>> *** Generate Percentiles
>>> sort yrm
>>> foreach X of varlist c1* {
>>> by yrm: egen p10_`X'= pctile(`X'), p(10.0)
>>> by yrm: egen p20_`X'= pctile(`X'), p(20.0)
>>> by yrm: egen p30_`X'= pctile(`X'), p(30.0)
>>> ...
>>> by yrm: egen p90_`X'= pctile(`X'), p(90.0)
>>> }
>>
>> This is two loops rolled out into one.
>> sort yrm foreach X of varlist c1* { forval i =
>> 10(10)90 { by yrm : egen p`i'_`X' = pctile(`X'), p(`i')
>> }
>> }
>>
>>> *** Sort into Percentile groups
>>> foreach X of varlist c1* {
>>> gen G_`X'=1 if `X'<p10_`X' & `X'~=.
>>> replace G_`X'=2 if `X'>p10_`X' & `X'<p20_`X' ... replace
>>> G_`X'=9 if `X'>p80_`X' & `X'<p90_`X' replace G_`X'=10 if
>>> `X'>p90_`X' & `X'~=.
>>> }
>>
>> Similar story with boundary conditions.
>> foreach X of varlist c1* {
>> gen byte G_`X' = `X' < p10_`X'
>> forval i = 2/9 { local j = 10 * `i'
>> replace G_`X' = `i' if `X' < p`j'_`X' & G_`X' == 0 }
>> replace G_`X' = cond(`X' == ., ., 10) if G_`X' == 0 }
>>
>>
>>> *** Calculate return mean for each group
>>> sort yrm
>>> foreach X of varlist G* {
>>> by yrm: egen R1`X'= mean(c1ds_ri) if `X'==1
>>> by yrm: egen R2`X'= mean(c1ds_ri) if `X'==2
>>> ...
>>> by yrm: egen R9`X'= mean(c1ds_ri) if `X'==9
>>> by yrm: egen R10`X'= mean(c1ds_ri) if `X'==10
>>> }
>>
>> Why do you need all these variables? The results for bin are disjoint,
>> so can be put in a single variable.
>> foreach X of varlist G* { bysort yrm `X' : egen R`X' =
>> mean(c1ds_ri)
>> }
>> Having said that, it can probably done more directly with a series of
>> -collapse-s.
>>
>
