Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point |

Date |
Thu, 19 Jul 2012 10:06:15 +0200 |

On Thu, Jul 19, 2012 at 9:39 AM, Lucia R.Latino wrote: > I dropped all the observations greater than 10,000 because I considered them > outliers. However, even without dropping the observations, q-q plots show > the same pattern. Also the use of the weights does not make so much > difference, as you said. > > I know that the distribution is not lognormal (it is what I was trying > exactly to show), my concern was about the plots. As I mentioned before, > the points are close enough to the 45 line degree (in the case of the GB2 > and Singh-Maddala, the points on the q-q plot fall exactly on the straight > line) till approximately the value 9,000. After that, the points depart > significantly from the 45 line degree, they become a parallel line to the > x-axis; furthermore, while the sample distribution reaches value 10,000, the > theoretical one reaches approximately value 20,000. > > I think that this is a "weird" behavior of the plots or I am simply missing > something important about the q-q plots. The "weirdness" is probably not in the plot but in your data: The tail of your observed variable does not fit the tail of your theoretical distribution. To be exact: this graph tells you that if the theoretical distribution were correct than the largest values of your observed variable should have been a lot larger. So, if you have reason to believe that any of these models should be reasonable, than the values larger than 10,000 are in all likelihood not outliers and you should not have dropped them. Including those values does not guarantee that these distributions fit, but leaving them out is almost certainly inconsistent with your models. -- Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Lucia Latino <Latino@economia.uniroma2.it>

**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Nick Cox <njcoxstata@gmail.com>

**R: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*"Lucia R.Latino" <Latino@economia.uniroma2.it>

**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Nick Cox <njcoxstata@gmail.com>

**R: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*"Lucia R.Latino" <Latino@economia.uniroma2.it>

- Prev by Date:
**Re: st: modifying egen to add a replace feature** - Next by Date:
**st: how to retrieve individual score values from the principal component analysis** - Previous by thread:
**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point** - Next by thread:
**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point** - Index(es):