Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Ommit missing observations from sum, det?


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Ommit missing observations from sum, det?
Date   Mon, 9 Jul 2012 16:40:25 +0100

r(p99) can't be said to define a quartile.

That aside, Stata's fault here is that it is doing precisely what you asked.

Missing values (not observations; an observation in Stata is the
entire case, record, or row of your data) count as greater than any
non-missing value and so satisfy your inequality. This is very well
documented e.g.

FAQ     . . . . . . . . . . . . . . . . Logical expressions and missing values
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
        2/03    Why is x > 1000 true when x contains missing value?
                http://www.stata.com/support/faqs/data/values.html


So either you need to add an extra condition to exclude missings

.... & initial_length < .

or (easier) just use -xtile- which automatically ignores missings.

. sysuse auto, clear
(1978 Automobile Data)

. xtile mpg_q = mpg, n(4)

. tab mpg_q

4 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         27       36.49       36.49
          2 |         11       14.86       51.35
          3 |         22       29.73       81.08
          4 |         14       18.92      100.00
------------+-----------------------------------
      Total |         74      100.00

. replace mpg = . in 1/5
(5 real changes made, 5 to missing)

. xtile mpg_q2 = mpg, n(4)

. tab mpg_q2

4 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         25       36.23       36.23
          2 |         10       14.49       50.72
          3 |         20       28.99       79.71
          4 |         14       20.29      100.00
------------+-----------------------------------
      Total |         69      100.00

. tab mpg_q2, missing

4 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         25       33.78       33.78
          2 |         10       13.51       47.30
          3 |         20       27.03       74.32
          4 |         14       18.92       93.24
          . |          5        6.76      100.00
------------+-----------------------------------
      Total |         74      100.00


On Mon, Jul 9, 2012 at 4:27 PM, Benedikt Achatz
<benedikt.achatz.sta@gmail.com> wrote:
> I am trying to seperate my data into quartiles, doing it with this code:
>
> sum initial_length, det
> gen initial_length_q=1 if initial_length <=r(p25)
> replace initial_length_q=2 if initial_length >r(p25) & initial_length <= r(p50)
> replace initial_length_q=3 if initial_length >r(p50) & initial_length <= r(p75)
> replace initial_length_q=4 if initial_length >r(p75) & initial_length <= r(p99)
> replace initial_length_q=5 if initial_length >r(p99)
>
> The problem that reveals itself to me is that if there are missing
> observations, those get put in the 99% quartile. Is there any specific
> reason behind it, and does anyone know how I could work around that?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index