Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Outlier: Detection


From   "Sergiy Radyakin" <[email protected]>
To   [email protected]
Subject   Re: st: RE: Outlier: Detection
Date   Wed, 20 Feb 2008 12:48:19 -0500

On 2/20/08, [email protected]
<[email protected]> wrote:
> Hi Austin,
> I ran your program with my data set of 190717 observations and found the following result.
>
> . Grubbs2 lnwage, lev(95)
> macro length exceeded
> r(1000);

That is the answer to Austin's question regarding why we need to limit
the number of unique values.

Badri might want to explain what he (she) wants to achieve. Working
with continious variables, it makes more sense to drop, say, top 1% of
earners. Is that something you want?

Regards,
   Sergiy Radyakin
>
> The variable lnwage is float type. What is the size of the macro length that is allowed to be used by this program. How to use program with 190717 or more number of observations in the data set.
>
> With regards.
>
> Badri Prasad
> Policy, Reporting and Data Development
> Labour Standards and Workplace Equity
> National Labour Operations Directorate
> HRSDC
> (819) 956 - 8146
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Austin Nichols
> Sent: 2008-02-20 11:48 AM
> To: [email protected]
> Subject: Re: st: RE: Outlier: Detection
>
>
> Sergiy, is there a reason to limit n to 90, or to use -inspect-
> (necessarily limiting n to 99)?  Would this version accomplish the
> same goal?
>
> program Grubbs2, rclass sortpreserve
>  syntax [varlist] [if] [in] [, Level(int 95)]
>  marksample touse
>  foreach v of local varlist {
>  tempvar c
>  qui bys `v' `touse': g `c'=_N-_n if `touse'
>  qui count if `c'==0 & `touse'
>  local n=r(N)
>  local t2=(invttail(`n'-2,(1-`level'/200)/(2*`n')))^2
>  local G_cr=((`n'-1)/sqrt(`n'))*sqrt(`t2'/(`n'-2+`t2'))
>  quietly levelsof `v' if `touse', local(levs)
>  if `: word count `levs''!=`n' error 198
>  loc levsum=0
>  loc sqsum=0
>  foreach lev of local levs {
>   local levsum=`levsum'+`lev'
>   local sqsum=`sqsum'+`lev'*`lev'
>  }
>  local mean=`levsum'/`n'
>  local levsdev=sqrt(`sqsum'/`n'-`mean'*`mean')
>  local outliers
>  foreach lev of local levs {
>   local Z=abs(`mean'-`lev')/`levsdev'
>   if `Z'>`G_cr' local outliers "`outliers' `lev'"
>  }
>  di as txt "Outliers in `v': " as res "`outliers'"
>  }
>  return local outliers="`outliers'"
> end
>
> sysuse auto
> Grubbs2 pr-gear, lev(99)
> Grubbs2 pr-gear if for==1, lev(99)
> Grubbs2 pr-gear if for==0, lev(99)
>
> (Disclaimer: I have not read the Grubbs article, but I share Maarten's
> skepticism about the utility of this approach.)
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index