Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: IQR


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: IQR
Date   Wed, 6 Jun 2007 23:19:26 +0100

I think you are referring to -iqr- from STB-3. The FAQ
advises 

"Say what command(s) you are using. If they are not part of 
official Stata, say where they come from: the STB/SJ, SSC, or 
other archives."

As you did not follow this advice, I had to puzzle out
which command you were referring to. No doubt various 
other members of the list fell at the first fence. 

Now as to your question: I do not understand what you 
do not understand. The help for -iqr- looks very helpful
to me. It includes these definitions:

============================================================
 IQR (Interquartile Range) = 75th percentile - 25th percentile
      Pseudo standard deviation = IQR/1.349
      10% trim mean             = Average of cases between 10th and
                                     90th percentiles
      Inner fences              = Q(25)-1.5IQR and Q(75)+1.5IQR
      Outer fences              = Q(25)-3IQR   and Q(75)+3IQR
      Mild outlier              = Q(25)-3IQR <= x < Q(25)-1.5IQR  or
                                  Q(75)+1.5IQR < x <= Q(75)+3IQR
      Severe outlier            = x < Q(25)-3IQR  or  x > Q(75)+3IQR
=============================================================

Thus a "severe outlier" lies more than 3 IQR away from the nearer 
quartile and a "mild outlier" lies more than 1.5 (but not more than 3) 
IQR away from the nearer quartile. 

These definitions go back to J.W. Tukey. 1977. Exploratory data 
analysis. Reading, MA: Addison-Wesley, except that the definitions
of quartiles Stata uses are documented at [R] summarize. 

These are arbitrary limits. Their main interest is that they are
sometimes used in boxplots to determine which data points should
be shown individually. 

That said, "getting rid" of severe outliers is, in my view, not 
usually a good idea unless there is independent evidence that 
the data are wholly untrustworthy (e.g. a laboratory record that
the experiment was grossly disturbed). Dropping values more than 
3 IQR away from the nearer quartile will in most instances throw
out important information. It would throw away most major cities
compared with cities in their country. 

Nick 
[email protected] 

[email protected]
 
> The description of IQR in Stata help is a little confusing.  
> I am using 
> this command to get rid of severe outliers but I am not quite 
> sure how 
> the iqr command calculates them.  The notation is a bit 
> confusing. Can 
> someone explain this to me or direct me to other sources?  I have 
> Statistics with Stata (for version 7), a book published by Lawrence 
> Hamilton, the person who wrote the IQR program, but I am still a bit 
> baffled.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index