Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Outlier: Detection


From   <badri.prasad@hrsdc-rhdsc.gc.ca>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Outlier: Detection
Date   Tue, 19 Feb 2008 14:34:00 -0500

Hi stata users,

I am trying to run a stata program to detecet outlier in my data set. I found 2 grubbs programs written in stata. Programs are here:

Program # 1.
_______________________________program begins_____________________________________________
program define grubbs
* this is a revised version of the original command, it no longer deletes missing obs or outlier
* instead, it sets "tag_grubbs" =1 if it believes the obs to be an outlier
* usage: "grubbs myvar .05 10"
version 8.0
*arguments:
* 1= Name of variable
* 2= Confidence interval (0.05 or 0.01)
* 3= Max number of iterations
args xvar conf maxit
tempvar dev missx
gen byte tag_grubbs = 0
local i = 1
di "deleting missing values"
gen byte missx =  `xvar'==.
* initial guess for critical value
scalar Gcrit = 10
* start with G > Gcrit (otherwise loop will not begin)
scalar G = Gcrit +1
di "maxit = " `maxit'
di "G= " G
di "Gcrit = " Gcrit
         while G > Gcrit & `i'<= `maxit' {
                   sum `xvar' if tag_grubbs == 0
                   local nobs = r(N)
                   gen `dev' = (abs(`xvar' -r(mean)))/r(sd)
                   gsort -`missx' -tag_grubbs `dev'
                   scalar G = `dev'[_N]
                   local ct = `conf'/(2*`nobs')
                   local ts = invttail(`nobs'-2,`ct')
                   scalar Gcrit = (`nobs'-1)*sqrt(`ts'^2/(`nobs'*(`nobs'-2+`ts'^2)))
                   di "Iteration = " `i' "   Critical G = " Gcrit " Current G = " G
                   if (G > Gcrit) di `xvar'[_N] " is an outlier, so tag_grubbs = 1"
                   replace tag_grubbs = 1 if  `dev' == G & G > Gcrit
                   local i = `i'+1
                   drop `dev'
}

if (`i'<=`maxit') di "Grubbs procedure terminated: no more outliers"
else di "Maximum iterations exceeded: Use larger maxit"
end
____________________________________end of program__________________________________


Program #2.
__________________________________________program begins___________________________-
program define grubbs
* this is a revised version of the original command, it no longer deletes missing obs or outlier
* instead, it sets "tag_grubbs" =1 if it believes the obs to be an outlier
* usage: "grubbs myvar .05 10"
version 8.0
*arguments:
* 1= Name of variable
* 2= Confidence interval (0.05 or 0.01)
* 3= Max number of iterations
args xvar conf maxit
tempvar dev missx
gen byte tag_grubbs = 0
local i = 1
di "deleting missing values"
gen byte missx =  `xvar'==.
* initial guess for critical value
scalar Gcrit = 10
* start with G > Gcrit (otherwise loop will not begin)
scalar G = Gcrit +1
di "maxit = " `maxit'
di "G= " G
di "Gcrit = " Gcrit
         while G > Gcrit & `i'<= `maxit' {
                   sum `xvar' if tag_grubbs == 0
                   local nobs = r(N)
                   gen `dev' = (abs(`xvar' -r(mean)))/r(sd)
                   gsort -`missx' -tag_grubbs `dev'
                   scalar G = `dev'[_N]
                   local ct = `conf'/(2*`nobs')
                   local ts = invttail(`nobs'-2,`ct')
                   scalar Gcrit = (`nobs'-1)*sqrt(`ts'^2/(`nobs'*(`nobs'-2+`ts'^2)))
                   di "Iteration = " `i' "   Critical G = " Gcrit " Current G = " G
                   if (G > Gcrit) di `xvar'[_N] " is an outlier, so tag_grubbs = 1"
                   replace tag_grubbs = 1 if  `dev' == G & G > Gcrit
                   local i = `i'+1
                   drop `dev'
}

if (`i'<=`maxit') di "Grubbs procedure terminated: no more outliers"
else di "Maximum iterations exceeded: Use larger maxit"
end
___________________________________end of program__________________________________________

Could anyone suggest me which program is better to use. I will appreciate if you please use auto data for variable price as an example to run these programs.

Thanks.

Badri Prasad
Policy, Reporting and Data Development
Labour Standards and Workplace Equity
National Labour Operations Directorate
HRSDC
(819) 956 - 8146


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index