Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Ommit missing observations from sum, det?

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Ommit missing observations from sum, det? Date Mon, 9 Jul 2012 16:40:25 +0100

```r(p99) can't be said to define a quartile.

That aside, Stata's fault here is that it is doing precisely what you asked.

Missing values (not observations; an observation in Stata is the
entire case, record, or row of your data) count as greater than any
non-missing value and so satisfy your inequality. This is very well
documented e.g.

FAQ     . . . . . . . . . . . . . . . . Logical expressions and missing values
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
2/03    Why is x > 1000 true when x contains missing value?
http://www.stata.com/support/faqs/data/values.html

So either you need to add an extra condition to exclude missings

.... & initial_length < .

or (easier) just use -xtile- which automatically ignores missings.

. sysuse auto, clear
(1978 Automobile Data)

. xtile mpg_q = mpg, n(4)

. tab mpg_q

4 quantiles |
of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |         27       36.49       36.49
2 |         11       14.86       51.35
3 |         22       29.73       81.08
4 |         14       18.92      100.00
------------+-----------------------------------
Total |         74      100.00

. replace mpg = . in 1/5
(5 real changes made, 5 to missing)

. xtile mpg_q2 = mpg, n(4)

. tab mpg_q2

4 quantiles |
of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |         25       36.23       36.23
2 |         10       14.49       50.72
3 |         20       28.99       79.71
4 |         14       20.29      100.00
------------+-----------------------------------
Total |         69      100.00

. tab mpg_q2, missing

4 quantiles |
of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |         25       33.78       33.78
2 |         10       13.51       47.30
3 |         20       27.03       74.32
4 |         14       18.92       93.24
. |          5        6.76      100.00
------------+-----------------------------------
Total |         74      100.00

On Mon, Jul 9, 2012 at 4:27 PM, Benedikt Achatz
<benedikt.achatz.sta@gmail.com> wrote:
> I am trying to seperate my data into quartiles, doing it with this code:
>
> sum initial_length, det
> gen initial_length_q=1 if initial_length <=r(p25)
> replace initial_length_q=2 if initial_length >r(p25) & initial_length <= r(p50)
> replace initial_length_q=3 if initial_length >r(p50) & initial_length <= r(p75)
> replace initial_length_q=4 if initial_length >r(p75) & initial_length <= r(p99)
> replace initial_length_q=5 if initial_length >r(p99)
>
> The problem that reveals itself to me is that if there are missing
> observations, those get put in the 99% quartile. Is there any specific
> reason behind it, and does anyone know how I could work around that?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```