[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <[email protected]> |

To |
<[email protected]> |

Subject |
st: RE: RE: RE: pctile and xtile question again |

Date |
Fri, 18 Jan 2008 19:21:04 -0000 |

I am happy if my code was helpful or instructive, but I don't see much connection between your problem as stated and the code. First, let's be clear on terminology. A quantile is a particular value x on a variable X which has an associated probability pr(X <= x). That is somewhere within any number of categories based on the ordered data. Thus on some variable we might find that Foobar Corp is at the 70% point, meaning 70% of values are less than Foobar's and 30% are greater. But that is within any number of categories, e.g. those based on 68%-72% or 66%-74% or 64%-76%, to mention only some centred on 70%. This arbitrariness is what worries me most, before the problem is made bivariate or multivariate by combining categories for different variables. It would seem to me more direct to model Foobar together with other firms and assess Foobar in terms of its residual given a model for all those firms. Exercising economic judgement on what firms are (qualitatively) comparable then would seem essential as well as desirable. (I did study economics fairly intensively in my youth.) Nick [email protected] -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Rajesh Tharyan Sent: 17 January 2008 23:42 To: [email protected] Subject: st: RE: RE: pctile and xtile question again Thanks very much for the suggestion nick. That is very elegant and straightforward. I will remember to explain user written commands in the future. As for why I am doing it. In finance area this sort of analysis is quite common. One common application is to assess the performance of a companies shares, by comparing its performance with the performance of a portfolio of shares of companies which fall in the same size quantile as that company for that during that period. (assuming size is the main factor determining the performance). If you believed there were two factors which are important, then you could create quantiles based on two variable. Of course after three variable things become quite complicated and one runs out of companies. So you are clearly right, there are other (better) ways of doing this. Thank you very much indeed Rajesh -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox Sent: 17 January 2008 17:45 To: [email protected] Subject: st: RE: pctile and xtile question again I have comments on two levels. First, on how to do this. As always, it is easiest for list members to see code in terms of datasets everyone can use. Your first bit seems rather indirect. I would use -centile- instead. Individual percentiles are left behind in memory as r class results by -centile-. Thus you need not put them into a variable and then take them out again, or create any variables you only need for one purpose. . sysuse auto . centile weight, centile(70) . gen byte weight_group = weight > r(c_1) if weight < . Then you can proceed directly to something like . egen mpg_group = xtile(mpg), by(weight_group) nq(3) . egen both_group = group(mpg_group weight_group) label Remember the request to explain where non-official commands you use come from. Thus -egen, xtile()- is a user-written function (by Ulrich Kohler) in the -egenmore- package on SSC. Extending this to two percentiles: . centile weight, centile(30 70) . gen byte weight_group = cond(weight < r(c_1), 1, cond(weight < r(c_2), 2, 3)) if weight < . and you can proceed as before . egen mpg_group = xtile(mpg), by(weight_group) nq(3) . egen both_group = group(mpg_group weight_group) label Note that in the auto dataset there are not in fact any missing values for -weight- but excluding them explicitly is usually going to be the right thing in most problems, and at worst does nothing. In fact, with two variables, a double restriction ... if weight < . & mpg < . is usually going to be the right thing, and at worst it does nothing and will not bite. Second, on why you are doing this. It may be impertinent, but I am curious. Under what circumstances must you do precisely this? Categorisation by quantiles throws away data. Seemingly arbitrary quantiles or numbers of quantiles do that capriciously. When is this the right thing to do in any data analysis? Nick [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: pctile and xtile question again***From:*"Nick Cox" <[email protected]>

**st: RE: RE: pctile and xtile question again***From:*"Rajesh Tharyan" <[email protected]>

- Prev by Date:
**st: Renaming year-quarter indicators with a local** - Next by Date:
**Re: st: when your sample is the entire population** - Previous by thread:
**st: RE: RE: pctile and xtile question again** - Next by thread:
**st: Re: conditioning reset at each gap** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |