Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Keep/Drop Observations for Top/Bottom X%


From   Lisa Wang <[email protected]>
To   [email protected]
Subject   Re: st: Keep/Drop Observations for Top/Bottom X%
Date   Thu, 11 Oct 2012 20:59:11 +1100

Thank you everyone for your suggestions. But I was wondering is there
a way to code it up such that I don't need to do -summa- or -_pctile-
and then look at the output to determine what the cutoff values will
be for the top/bottom X% before I do the next step?

Thank you again,
Lisa


On Thu, Oct 11, 2012 at 8:54 PM, Justina Fischer <[email protected]> wrote:
> Hi Nick,
>
> in principle you might be right.
>
> However, for reasons of practicability it is sometimes recommendable for subset analysis to simply upload the full data and drop a part rather than working with an 'if' restriction throughout all regressions.
>
> HTH
>
> Jusitna
>
>
> -------- Original-Nachricht --------
>> Datum: Thu, 11 Oct 2012 10:46:02 +0100
>> Von: Nick Cox <[email protected]>
>> An: [email protected]
>> Betreff: Re: st: Keep/Drop Observations for Top/Bottom X%
>
>> You need not -keep- or -drop- to do this; in fact -keep- or -drop-
>> here is usually a bad idea.
>>
>> (Furthermore, regressions of this kind are often more problematic than
>> they seem, but I'll let others expand on that if they wish.)
>>
>> For full flexibility here, skip -summarize- and go straight to -_pctile-.
>>
>> For example,
>>
>> . sysuse auto
>> (1978 Automobile Data)
>>
>> . _pctile mpg, p(10 90)
>>
>> . ret li
>>
>> scalars:
>>                  r(r1) =  14
>>                  r(r2) =  29
>>
>> So you can follow up with
>>
>> ... if mpg >= 29
>>
>> Warnings:
>>
>> 1. Watch out for ties.
>>
>> 2. Watch out for missing values at the top end.
>>
>> ... if mpg >= 29
>>
>> would include missings on -mpg- (if there were any).  -if inrange(mpg,
>> 29, .)- excludes the missings.
>>
>> Nick
>>
>> On Thu, Oct 11, 2012 at 10:34 AM, Lisa Wang <[email protected]> wrote:
>>
>> > I am unsure as to how I would go about keeping or dropping the
>> > top/bottom X% of observations of a variable. I would like to do this
>> > for further analysis on a subset of my data. For instance, I want to
>> > do some further regressions for the top 10% of my observations based
>> > on 'distance from home' and not the whole data set.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index