Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Removing outliers from my dataset


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Removing outliers from my dataset
Date   Wed, 17 Apr 2013 16:33:13 +0100

I think I agree with this, not that there can't be differing views on
a tricky area!

In essence, there should be a story that says why some data points
don't belong. That's more a subject-matter judgment call relating a
decision to research objectives than a statistical decision.

Nick
[email protected]


On 17 April 2013 15:43, Clyde B Schechter
<[email protected]> wrote:
> There has been a running thread initiated by a request for assistance in removing outliers from a data set.  The gist of the thread has been that removing outliers is, in general, a dangerous approach except in limited circumstances.  And the point has been made in the thread that bogus data are often inliers.
>
> In the particular circumstance, the extreme values that the original poster sought to remove are the result of anomalies in accounting rules as applied to particular special circumstances, so that the values are not meaningful.  In that case, while it would make sense to remove them, he should also be removing any non-outlier values that are also the result of the application of those same anomalies in the accounting rules. Those, though not extreme in value, are also not meaningful and, if included in your analysis, are introducing error and may lead to an incorrectly specified or estimated model.
>
> Removing the data that are known to be spuriously generated by these accounting rule anomalies, irrespective of their values, has an additional advantage.  If you just remove outliers, your model has no prospectively defined population to which it generalizes: applicability is conditional on the outcome.  By contrast, removing those items that were spuriously generated gives a prospective definition of the population to which the model generalizes.
>
> Clyde Schechter
> Dept. of Family & Social Medicine
> Albert Einstein College of Medicine
> Bronx, NY, USA
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index