Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Removing outliers from my dataset


From   Clyde B Schechter <clyde.schechter@einstein.yu.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Removing outliers from my dataset
Date   Wed, 17 Apr 2013 14:43:48 +0000

There has been a running thread initiated by a request for assistance in removing outliers from a data set.  The gist of the thread has been that removing outliers is, in general, a dangerous approach except in limited circumstances.  And the point has been made in the thread that bogus data are often inliers.

In the particular circumstance, the extreme values that the original poster sought to remove are the result of anomalies in accounting rules as applied to particular special circumstances, so that the values are not meaningful.  In that case, while it would make sense to remove them, he should also be removing any non-outlier values that are also the result of the application of those same anomalies in the accounting rules. Those, though not extreme in value, are also not meaningful and, if included in your analysis, are introducing error and may lead to an incorrectly specified or estimated model.

Removing the data that are known to be spuriously generated by these accounting rule anomalies, irrespective of their values, has an additional advantage.  If you just remove outliers, your model has no prospectively defined population to which it generalizes: applicability is conditional on the outcome.  By contrast, removing those items that were spuriously generated gives a prospective definition of the population to which the model generalizes.

Clyde Schechter
Dept. of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index