Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point

From	Maarten Buis <[email protected]>
To	[email protected]
Subject	Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
Date	Thu, 19 Jul 2012 10:06:15 +0200

On Thu, Jul 19, 2012 at 9:39 AM, Lucia R.Latino wrote:
> I dropped all the observations greater than 10,000 because I considered them
> outliers. However, even without dropping the observations, q-q plots show
> the same pattern. Also the use of the weights does not make so much
> difference, as you said.
>
> I know that the distribution is not lognormal (it is what I was trying
> exactly to show),  my concern was about the plots. As I mentioned before,
> the points are close enough to the 45 line degree  (in the case of the GB2
> and Singh-Maddala, the points on the q-q plot fall exactly on the straight
> line) till approximately the value 9,000. After that, the points depart
> significantly from the 45 line degree, they become a parallel line to the
> x-axis; furthermore, while the sample distribution reaches value 10,000, the
> theoretical one reaches approximately value 20,000.
>
> I think that this is a "weird" behavior of the plots or I am simply missing
> something important about the q-q plots.

The "weirdness" is probably not in the plot but in your data: The tail
of your observed variable does not fit the tail of your theoretical
distribution. To be exact: this graph tells you that if the
theoretical distribution were correct than the largest values of your
observed variable should have been a lot larger. So, if you have
reason to believe that any of these models should be reasonable, than
the values larger than 10,000 are in all likelihood not outliers and
you should not have dropped them. Including those values does not
guarantee that these distributions fit, but leaving them out is almost
certainly inconsistent with your models.

-- Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
  - From: Lucia Latino <[email protected]>
- Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
  - From: Nick Cox <[email protected]>
- R: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
  - From: "Lucia R.Latino" <[email protected]>
- Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
  - From: Nick Cox <[email protected]>
- R: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
  - From: "Lucia R.Latino" <[email protected]>

Prev by Date: Re: st: modifying egen to add a replace feature
Next by Date: st: how to retrieve individual score values from the principal component analysis
Previous by thread: Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
Next by thread: Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point
Index(es):
- Date
- Thread