Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Keep/Drop Observations for Top/Bottom X%


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Keep/Drop Observations for Top/Bottom X%
Date   Thu, 11 Oct 2012 11:11:34 +0100

I don't fully understand the question. If you want to slice at a
percentile (quantile), you have to calculate a percentile somehow.

You could always write a program that does some combination of
calculating the percentile _and_ some analysis via a single call and
if you were doing this many, many times that would be a good idea. I
don't think such a program exists already canned.

Nick

On Thu, Oct 11, 2012 at 10:59 AM, Lisa Wang <lhwang0925@gmail.com> wrote:
> Thank you everyone for your suggestions. But I was wondering is there
> a way to code it up such that I don't need to do -summa- or -_pctile-
> and then look at the output to determine what the cutoff values will
> be for the top/bottom X% before I do the next step?
>
> Thank you again,
> Lisa
>
>
> On Thu, Oct 11, 2012 at 8:54 PM, Justina Fischer <JAVFischer@gmx.de> wrote:
>> Hi Nick,
>>
>> in principle you might be right.
>>
>> However, for reasons of practicability it is sometimes recommendable for subset analysis to simply upload the full data and drop a part rather than working with an 'if' restriction throughout all regressions.
>>
>> HTH
>>
>> Jusitna
>>
>>
>> -------- Original-Nachricht --------
>>> Datum: Thu, 11 Oct 2012 10:46:02 +0100
>>> Von: Nick Cox <njcoxstata@gmail.com>
>>> An: statalist@hsphsun2.harvard.edu
>>> Betreff: Re: st: Keep/Drop Observations for Top/Bottom X%
>>
>>> You need not -keep- or -drop- to do this; in fact -keep- or -drop-
>>> here is usually a bad idea.
>>>
>>> (Furthermore, regressions of this kind are often more problematic than
>>> they seem, but I'll let others expand on that if they wish.)
>>>
>>> For full flexibility here, skip -summarize- and go straight to -_pctile-.
>>>
>>> For example,
>>>
>>> . sysuse auto
>>> (1978 Automobile Data)
>>>
>>> . _pctile mpg, p(10 90)
>>>
>>> . ret li
>>>
>>> scalars:
>>>                  r(r1) =  14
>>>                  r(r2) =  29
>>>
>>> So you can follow up with
>>>
>>> ... if mpg >= 29
>>>
>>> Warnings:
>>>
>>> 1. Watch out for ties.
>>>
>>> 2. Watch out for missing values at the top end.
>>>
>>> ... if mpg >= 29
>>>
>>> would include missings on -mpg- (if there were any).  -if inrange(mpg,
>>> 29, .)- excludes the missings.
>>>
>>> Nick
>>>
>>> On Thu, Oct 11, 2012 at 10:34 AM, Lisa Wang <lhwang0925@gmail.com> wrote:
>>>
>>> > I am unsure as to how I would go about keeping or dropping the
>>> > top/bottom X% of observations of a variable. I would like to do this
>>> > for further analysis on a subset of my data. For instance, I want to
>>> > do some further regressions for the top 10% of my observations based
>>> > on 'distance from home' and not the whole data set.
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index