r(p99) can't be said to define a quartile. That aside, Stata's fault here is that it is doing precisely what you asked. Missing values (not observations; an observation in Stata is the entire case, record, or row of your data) count as greater than any non-missing value and so satisfy your inequality. This is very well documented e.g. FAQ . . . . . . . . . . . . . . . . Logical expressions and missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould 2/03 Why is x > 1000 true when x contains missing value? http://www.stata.com/support/faqs/data/values.html So either you need to add an extra condition to exclude missings .... & initial_length < . or (easier) just use -xtile- which automatically ignores missings. . sysuse auto, clear (1978 Automobile Data) . xtile mpg_q = mpg, n(4) . tab mpg_q 4 quantiles | of mpg | Freq. Percent Cum. ------------+----------------------------------- 1 | 27 36.49 36.49 2 | 11 14.86 51.35 3 | 22 29.73 81.08 4 | 14 18.92 100.00 ------------+----------------------------------- Total | 74 100.00 . replace mpg = . in 1/5 (5 real changes made, 5 to missing) . xtile mpg_q2 = mpg, n(4) . tab mpg_q2 4 quantiles | of mpg | Freq. Percent Cum. ------------+----------------------------------- 1 | 25 36.23 36.23 2 | 10 14.49 50.72 3 | 20 28.99 79.71 4 | 14 20.29 100.00 ------------+----------------------------------- Total | 69 100.00 . tab mpg_q2, missing 4 quantiles | of mpg | Freq. Percent Cum. ------------+----------------------------------- 1 | 25 33.78 33.78 2 | 10 13.51 47.30 3 | 20 27.03 74.32 4 | 14 18.92 93.24 . | 5 6.76 100.00 ------------+----------------------------------- Total | 74 100.00 On Mon, Jul 9, 2012 at 4:27 PM, Benedikt Achatz <benedikt.achatz.sta@gmail.com> wrote: > I am trying to seperate my data into quartiles, doing it with this code: > > sum initial_length, det > gen initial_length_q=1 if initial_length <=r(p25) > replace initial_length_q=2 if initial_length >r(p25) & initial_length <= r(p50) > replace initial_length_q=3 if initial_length >r(p50) & initial_length <= r(p75) > replace initial_length_q=4 if initial_length >r(p75) & initial_length <= r(p99) > replace initial_length_q=5 if initial_length >r(p99) > > The problem that reveals itself to me is that if there are missing > observations, those get put in the 99% quartile. Is there any specific > reason behind it, and does anyone know how I could work around that? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

