[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: pctile and xtile question again

From	"Rajesh Tharyan" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: pctile and xtile question again
Date	Thu, 17 Jan 2008 23:42:28 -0000

Thanks very much for the suggestion nick. That is very elegant and
straightforward. I will remember to explain user written commands in the
future.

As for why I am doing it. In finance area this sort of analysis is quite
common. One common application is to assess the performance of a
companies shares, by comparing its performance with the performance of a
portfolio of shares of companies which fall in the same size quantile as
that company for that during that period. (assuming size is the main
factor determining the performance). If you believed there were two
factors which are important, then you could create quantiles based on
two variable. Of course after three variable things become quite
complicated and one runs out of companies. So you are clearly right,
there are other (better) ways of doing this.

Thank you very much indeed
Rajesh


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: 17 January 2008 17:45
To: [email protected]
Subject: st: RE: pctile and xtile question again

I have comments on two levels. 

First, on how to do this. As always, it is easiest for list members to
see code in terms of datasets everyone can use. 

Your first bit seems rather indirect. I would use -centile- instead.
Individual percentiles are left behind in memory as r class results by 
-centile-. Thus you need not put them into a variable and then take them
out again, or create any variables you only need for one purpose. 

. sysuse auto 
. centile weight, centile(70) 
. gen byte weight_group = weight > r(c_1) if weight < . 

Then you can proceed directly to something like 

. egen mpg_group = xtile(mpg), by(weight_group) nq(3) 
. egen both_group = group(mpg_group weight_group) label 

Remember the request to explain where non-official commands you use come
from. Thus -egen, xtile()- is a user-written function (by Ulrich Kohler)
in the -egenmore- package on SSC. 

Extending this to two percentiles: 

. centile weight, centile(30 70) 
. gen byte weight_group = cond(weight < r(c_1), 1, 
                          cond(weight < r(c_2), 2, 3)) if weight < . 

and you can proceed as before

. egen mpg_group = xtile(mpg), by(weight_group) nq(3) 
. egen both_group = group(mpg_group weight_group) label

Note that in the auto dataset there are not in fact any missing values
for 
-weight- but excluding them explicitly is usually going to be the right
thing in most problems, and at worst does nothing. In fact, with two
variables, a double restriction 

... if weight < . & mpg < . 

is usually going to be the right thing, and at worst it does nothing and
will not bite. 

Second, on why you are doing this. It may be impertinent, but I am
curious. Under what circumstances must you do precisely this?
Categorisation by quantiles throws away data. Seemingly arbitrary
quantiles or numbers of quantiles do that capriciously. When is this the
right thing to do in any data analysis? 

Nick
[email protected] 


Rajesh Tharyan
==============

I have two variables x and y, which I have to put into 6 groups. 

I am using the code below (code I) to first cut the x variable  into 2
groups based on its 70th percentile value. And then, for each group of
the x variable I cut the y variable into 3 equal groups, and finally put
the two together to form the final six groups.  

What I would like to do is cut the y variable for each group of x based
on the 30th and 70th percentile value. The code (Code II) below is my
present solution and it seems very complicated. Any suggestions are very
much appreciated. IS it possible to cut at specified percentiles?



Code I
*************start********************
* this bit cuts the x variable into two groups based on the 70th
percentile value

pctile xu=x, nq(10) genp(xx)
replace xu=. if xx~=70

sort xu (Is this step necessary? I get slightly different numbers if I
sort and when I do not sort for example for one group I get 481 with and
477 without sorting)

xtile xc = x, cutpoints(xu)
drop xx xu

* this bits cuts the y variable into three groups for each group of x

egen yc=xtile(y), by(xc) nq(3)

* forming the final 6 groups

gen gp=10*xc+yc

****************end*******************
 

Code II
************start*********

pctile xu=x, nq(10) genp(xx)
replace xu=. if xx~=70
sort xu 
xtile xc = x, cutpoints(xu)
drop xx xu

pctile xmmu=y if xc==1, nq(10) genp(yy)
replace xmmu=. if yy~=30 & yy~=70
pctile xmmcu1=y if xc==2, nq(10) genp(yy1)
replace xmmcu1=. if yy1~=30 & yy1~=70

xtile yc=y if mc==1, cutpoints(xmmu)
xtile yc1=y if mc==2, cutpoints(xmmu1)
replace yc=yc1 if yc==. & xc==2
drop xmmu xmmu1 yc1 yy yy1
gen gp=10*xc+yc 

***********end************


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: RE: RE: pctile and xtile question again
  - From: "Nick Cox" <[email protected]>

References:
- st: RE: pctile and xtile question again
  - From: "Nick Cox" <[email protected]>

Prev by Date: Re: RE: st: string calculation
Next by Date: st: Is PSMATCH2 able to perform exact match?
Previous by thread: st: RE: pctile and xtile question again
Next by thread: st: RE: RE: RE: pctile and xtile question again
Index(es):
- Date
- Thread