Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point


From   Maarten Buis <maartenlbuis@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
Date   Thu, 19 Jul 2012 10:06:15 +0200

On Thu, Jul 19, 2012 at 9:39 AM, Lucia R.Latino wrote:
> I dropped all the observations greater than 10,000 because I considered them
> outliers. However, even without dropping the observations, q-q plots show
> the same pattern. Also the use of the weights does not make so much
> difference, as you said.
>
> I know that the distribution is not lognormal (it is what I was trying
> exactly to show),  my concern was about the plots. As I mentioned before,
> the points are close enough to the 45 line degree  (in the case of the GB2
> and Singh-Maddala, the points on the q-q plot fall exactly on the straight
> line) till approximately the value 9,000. After that, the points depart
> significantly from the 45 line degree, they become a parallel line to the
> x-axis; furthermore, while the sample distribution reaches value 10,000, the
> theoretical one reaches approximately value 20,000.
>
> I think that this is a "weird" behavior of the plots or I am simply missing
> something important about the q-q plots.

The "weirdness" is probably not in the plot but in your data: The tail
of your observed variable does not fit the tail of your theoretical
distribution. To be exact: this graph tells you that if the
theoretical distribution were correct than the largest values of your
observed variable should have been a lot larger. So, if you have
reason to believe that any of these models should be reasonable, than
the values larger than 10,000 are in all likelihood not outliers and
you should not have dropped them. Including those values does not
guarantee that these distributions fit, but leaving them out is almost
certainly inconsistent with your models.

-- Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany


http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index