Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Decile sorts


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Decile sorts
Date   Thu, 9 Nov 2006 23:18:14 -0000

Various comments sprinkled here and there. You may have
strong reasons to use these decile bins, but binning 
strikes me as, usually, at best a means towards an end 
(or perhaps ends towards some means). Some nonparametric
regression might do more justice to the data. 

Also, you are mixing two naming conventions 1...10 
and 10...90. Just use one. 

Nick 
n.j.cox@durham.ac.uk 

Thomas Erdmann
 
> I am trying to sort my observations into deciles according to 
> one attribute
> and afterwards calculating the average of another attribute 
> of those ten groups. 
 
> Please find the code I came up with below [lines with ... are 
> omitted], yrm is the time variable (YearMonth)
> 
> (1) As far as I can tell it works out, but a) it's a lot of code and
> b)produces a lot of variables and c)generating the output is 
> rather awkward.
> 
> Could you give me hints on how to implement a smarter 
> solution or if there
> are any errors in the way the calculation is carried out currently?
 
> *** Generate Percentiles
> sort yrm 	
> 	foreach X of varlist c1* {
> 	by yrm: egen p10_`X'= pctile(`X'), p(10.0)
> 	by yrm: egen p20_`X'= pctile(`X'), p(20.0)
> 	by yrm: egen p30_`X'= pctile(`X'), p(30.0)
> 	...
> 	by yrm: egen p90_`X'= pctile(`X'), p(90.0)
> 	}

This is two loops rolled out into one. 

	sort yrm 
	foreach X of varlist c1* { 
		forval i = 10(10)90 { 
			by yrm : egen p`i'_`X' = pctile(`X'), p(`i') 
		}
	} 

 
> *** Sort into Percentile groups
> 	foreach X of varlist c1* {
> 	gen G_`X'=1 if `X'<p10_`X' & `X'~=.
> 	replace G_`X'=2 if `X'>p10_`X' & `X'<p20_`X' 
> 	... 
> 	replace G_`X'=9 if `X'>p80_`X' & `X'<p90_`X' 
> 	replace G_`X'=10 if `X'>p90_`X' & `X'~=.
> 	}

Similar story with boundary conditions. 

	foreach X of varlist c1* {
		gen byte G_`X' = `X' < p10_`X' 
		
		forval i = 2/9 { 
			local j = 10 * `i' 
			replace G_`X' = `i' if `X' < p`j'_`X' & G_`X' == 0 
		} 

		replace G_`X' = cond(`X' == ., ., 10) if G_`X' == 0 
	}

 
> *** Calculate return mean for each group
> sort yrm
> 	foreach X of varlist G* {
> 	by yrm: egen R1`X'= mean(c1ds_ri) if `X'==1
> 	by yrm: egen R2`X'= mean(c1ds_ri) if `X'==2
> 	...
> 	by yrm: egen R9`X'= mean(c1ds_ri) if `X'==9
> 	by yrm: egen R10`X'= mean(c1ds_ri) if `X'==10
> 	}

Why do you need all these variables? The results 
for bin are disjoint, so can be put in a single 
variable. 

	foreach X of varlist G* { 
		bysort yrm `X' : egen R`X' = mean(c1ds_ri)
	} 

Having said that, it can probably done more 
directly with a series of -collapse-s. 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index