Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Re: Outlier detection


From   "Steichen" <[email protected]>
To   <[email protected]>
Subject   st: Re: Re: Outlier detection
Date   Tue, 20 Jan 2004 10:59:43 -0500

Janet Oliver writes:

> Can I detect outliers in Stata. A boxplot of my data shows extreme values
> and I cannot find a transformation to normality. I am unhappy at just
> discarding results because they are "extreme" and was wondering if there
is
> an implimentation of Grubb's or Dixon's test, or indeed any more
> satisfactory test.

I have old,  rough,  unpublished implementations of both Grubb's test and
the Dixon's R10 test, though my Dixon code is limited to  p = .05 and sample
sizes of 3 to 30.

I'll insert the Grubb's code below.  Let me know if the Dixon code is useful
to you.

Tom Steichen

----cut here----------------------------------------------
program define grubbs
*! version 1.0.0  TJS   8may2000
*! syntax: grubbs varlist [if] [in], [ID(varname) P(alpha)]
        version 6
        syntax varlist(numeric min=1) [if] [in] [, ID(varname) P(real .05) ]
        marksample touse, novarlist
        tokenize `varlist'

        if "`id'" != "" { local vallabl : value label `id' }

        cap if "`if'" != "" {ifexp "`if'"}   /* display expanded if */
        if _rc == 0 & "`if'" != "" {
                di
                local i 1
                while `i' <= $S_0 {
                        di in bl "${S_`i'}"
                        local i = `i' + 1
                }
     }
        preserve
        qui drop if !`touse'

* Do calculations
        di
        di in gr "---------+" _dup(67) "-"
        di in gr "variable |  g  >  G(`p')" _col(29) "mean      sd    reject
`id' (case label)"
        di in gr "---------+" _dup(67) "-"

        tempvar sdv esdv emax
        while "`1'" != "" {
                local flag = 1
                while `flag' {
                        qui summ `1'
                        local n = r(N)
                        local mean = r(mean)
                        local sd = sqrt(r(Var))
                        qui gen  `sdv'  = abs(`1' - `mean')/`sd'
                        qui egen `esdv' = max(`sdv')
                        local g = `esdv'
                        local t = (invt(`n'-2,(1-`p'/`n')))^2
                        local G = (`n' - 1) * sqrt(`t'/(`n'- 2 + `t')) /
sqrt(`n')
                        local c = 8 - length("`1'")
                        di in gr _skip(`c') "`1'" _col(10) "| " in ye %5.3f
`g' _c
                        di in ye %7.3f `G' %9.3f `mean' %8.3f `sd' _c
                        if `g' > `G' {
                                sort `sdv'
                                local out = `1'[`n']
                                if "`id'" != "" { local lab = `id'[`n'] }
                                if "`vallabl'" != "" {
                                        local lab : label `vallabl' `lab'
                                }
                                if length("`lab'") > 26 {
                                        local lab = substr("`lab'",1,23) +
"..."
                                }
                                di in wh %10.3f `out' in ye " `lab'"
                                qui replace `1' = . in `n'
                                local n = `n' - 1
                                if `n' < 2 { local flag = 0 }
                        }
                        else {
                                di
                                local flag = 0
                        }
                        qui drop `sdv' `esdv'
                        cap drop `emax'
                } /* end while flag */
                macro shift
                di in gr "---------+" _dup(67) "-"
        } /* end while variables */
end

----cut here----------------------------------------------
.-
 help for ^grubbs^                                        (version 1.0.0)
.-

Grubbs' outlier test
--------------------

   ^grubbs^ varlist [^if^ exp] [^in^ range] [^, id(^varname^) p(^#^)^]


Description
-----------

^grubbs^ implements F. E. Grubbs' (1969) test for outliers in a sample
from a normal distribution.  The test computes the extreme Studentized
deviate (called g in the printout),

        max |X - mean|
 ESD = ---------------- ,
              sd

and compares it to a critical value, G, where Pr(g > G) = p.  If g exceeds
G,
then the X which yielded the max |X-mean| is considered to be an outlier
and is removed from the sample.

The program proceeds sequential, processing the new reduced sample in the
same manner as above, and continues until no outlier is found.

The printout reports the value of the computed ESD (labeled g), the critical
G for the current sample size, and the mean and standard deviation of the
current data.  If an outlier is detected, its value (and observation id, if
option ^id()^ is specified) is also reported.  The program then reprocesses
the reduced sample and provides additional lines of similar output until no
outlier is found.  At that point, the program terminates when only one
variable is specified in the varlist, or processes the additional variables
if varlist contains more than one variable.


Option
------

^id(varname)^ provides the name of a variable containing observation id
data.
It can be a string variable or an unlabeled or value-labeled numeric
variable.
If specified, its value is displayed for observations declared to be
outliers.
The id is truncated if longer than 26 characters.

^p(#)^ specifies the two-tailed probability p.  p=.05 is the default value.


Saved values
------------

 (none)


Examples
--------

 . ^grubbs price, id(make)^

 . ^grubbs price-turn, id(make)^

 . ^grubbs price-turn if foreign, id(make)^


Author
------

Thomas J. Steichen


References
----------

Grubbs, F. E., 1969.  Procedures for detecting outlying observations in
samples.  Technometrics 11:1-21.


Also see
--------
    STB:  STB-3 sed4 (^iqr^)
 Manual:  [R] lv, hadimvo
On-line:  help for @dixon@, @lv@, @hadimvo@, and @iqr@ (if installed)
----cut here----------------------------------------------


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index