Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Keep/Drop Observations for Top/Bottom X%

From	"Justina Fischer" <[email protected]>
To	[email protected]
Subject	Re: st: Keep/Drop Observations for Top/Bottom X%
Date	Thu, 11 Oct 2012 11:54:58 +0200

Hi Nick,

in principle you might be right.

However, for reasons of practicability it is sometimes recommendable for subset analysis to simply upload the full data and drop a part rather than working with an 'if' restriction throughout all regressions.

HTH

Jusitna


-------- Original-Nachricht --------
> Datum: Thu, 11 Oct 2012 10:46:02 +0100
> Von: Nick Cox <[email protected]>
> An: [email protected]
> Betreff: Re: st: Keep/Drop Observations for Top/Bottom X%

> You need not -keep- or -drop- to do this; in fact -keep- or -drop-
> here is usually a bad idea.
> 
> (Furthermore, regressions of this kind are often more problematic than
> they seem, but I'll let others expand on that if they wish.)
> 
> For full flexibility here, skip -summarize- and go straight to -_pctile-.
> 
> For example,
> 
> . sysuse auto
> (1978 Automobile Data)
> 
> . _pctile mpg, p(10 90)
> 
> . ret li
> 
> scalars:
>                  r(r1) =  14
>                  r(r2) =  29
> 
> So you can follow up with
> 
> ... if mpg >= 29
> 
> Warnings:
> 
> 1. Watch out for ties.
> 
> 2. Watch out for missing values at the top end.
> 
> ... if mpg >= 29
> 
> would include missings on -mpg- (if there were any).  -if inrange(mpg,
> 29, .)- excludes the missings.
> 
> Nick
> 
> On Thu, Oct 11, 2012 at 10:34 AM, Lisa Wang <[email protected]> wrote:
> 
> > I am unsure as to how I would go about keeping or dropping the
> > top/bottom X% of observations of a variable. I would like to do this
> > for further analysis on a subset of my data. For instance, I want to
> > do some further regressions for the top 10% of my observations based
> > on 'distance from home' and not the whole data set.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Keep/Drop Observations for Top/Bottom X%
  - From: Maarten Buis <[email protected]>
- Re: st: Keep/Drop Observations for Top/Bottom X%
  - From: Nick Cox <[email protected]>
- Re: st: Keep/Drop Observations for Top/Bottom X%
  - From: Lisa Wang <[email protected]>

References:
- st: Keep/Drop Observations for Top/Bottom X%
  - From: Lisa Wang <[email protected]>
- Re: st: Keep/Drop Observations for Top/Bottom X%
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Keep/Drop Observations for Top/Bottom X%
Next by Date: Re: st: Keep/Drop Observations for Top/Bottom X%
Previous by thread: Re: st: Keep/Drop Observations for Top/Bottom X%
Next by thread: Re: st: Keep/Drop Observations for Top/Bottom X%
Index(es):
- Date
- Thread