Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Keep/Drop Observations for Top/Bottom X%

From   Nick Cox <>
Subject   Re: st: Keep/Drop Observations for Top/Bottom X%
Date   Thu, 11 Oct 2012 10:46:02 +0100

You need not -keep- or -drop- to do this; in fact -keep- or -drop-
here is usually a bad idea.

(Furthermore, regressions of this kind are often more problematic than
they seem, but I'll let others expand on that if they wish.)

For full flexibility here, skip -summarize- and go straight to -_pctile-.

For example,

. sysuse auto
(1978 Automobile Data)

. _pctile mpg, p(10 90)

. ret li

                 r(r1) =  14
                 r(r2) =  29

So you can follow up with

... if mpg >= 29


1. Watch out for ties.

2. Watch out for missing values at the top end.

... if mpg >= 29

would include missings on -mpg- (if there were any).  -if inrange(mpg,
29, .)- excludes the missings.


On Thu, Oct 11, 2012 at 10:34 AM, Lisa Wang <> wrote:

> I am unsure as to how I would go about keeping or dropping the
> top/bottom X% of observations of a variable. I would like to do this
> for further analysis on a subset of my data. For instance, I want to
> do some further regressions for the top 10% of my observations based
> on 'distance from home' and not the whole data set.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index