[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Outlier: Detection

From   "Austin Nichols" <>
Subject   Re: st: RE: Outlier: Detection
Date   Wed, 20 Feb 2008 13:58:23 -0500

Sergiy <> and

If the issue is that there are more distinct values than -levelsof-
can handle, that is easily resolved, unless I am missing some finer

prog grub2, rclass sortpreserve
 syntax [varlist] [if] [in] [, Level(int 95)]
 marksample touse
 foreach v of local varlist {
 tempvar c lev2 levsum sqsum Z
 qui {
  bys `v' `touse': g `c'=0 if (_N-_n==0)&`touse'
  count if `c'==0 & `touse'
  local n=r(N)
  local t2=(invttail(`n'-2,(1-`level'/200)/(2*`n')))^2
  local G_cr=((`n'-1)/sqrt(`n'))*sqrt(`t2'/(`n'-2+`t2'))
  sort `c'
  g `lev2'=`v'^2
  g `levsum'=sum(`v') if `c'<.
  g `sqsum'=sum(`lev2') if `c'<.
  qui su `levsum', meanonly
  loc lsum=r(max)
  qui su `sqsum', meanonly
  loc ssum=r(max)
  local mean=`lsum'/`n'
  local levsdev=sqrt(`ssum'/`n'-(`mean')^2)
  g `Z'=abs(`mean'-`v')/`levsdev' if `c'<.
  levelsof `v' if `Z'>`G_cr'&`c'<., local(outliers)
 di as txt "Outliers in `v': " as res "`outliers'"
 return local outliers="`outliers'"

range n 1 190717 190717
g x=invnorm(uniform())
replace x=6 in 1

No doubt the above program could be cleaned up a bit...

On Feb 20, 2008 12:48 PM, Sergiy Radyakin <> wrote:
> On 2/20/08,
> <> wrote:
> > Hi Austin,
> > I ran your program with my data set of 190717 observations and found the following result.
> >
> > . Grubbs2 lnwage, lev(95)
> > macro length exceeded
> > r(1000);
> That is the answer to Austin's question regarding why we need to limit
> the number of unique values.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index