Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lucia R.Latino" <[email protected]> |

To |
<[email protected]> |

Subject |
R: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point |

Date |
Thu, 19 Jul 2012 09:39:43 +0200 |

Dear Nick, I dropped all the observations greater than 10,000 because I considered them outliers. However, even without dropping the observations, q-q plots show the same pattern. Also the use of the weights does not make so much difference, as you said. I know that the distribution is not lognormal (it is what I was trying exactly to show), my concern was about the plots. As I mentioned before, the points are close enough to the 45 line degree (in the case of the GB2 and Singh-Maddala, the points on the q-q plot fall exactly on the straight line) till approximately the value 9,000. After that, the points depart significantly from the 45 line degree, they become a parallel line to the x-axis; furthermore, while the sample distribution reaches value 10,000, the theoretical one reaches approximately value 20,000. I think that this is a "weird" behavior of the plots or I am simply missing something important about the q-q plots. Best, Lucia -----Messaggio originale----- Da: [email protected] [mailto:[email protected]] Per conto di Nick Cox Inviato: mercoledì 18 luglio 2012 20:30 A: [email protected] Oggetto: Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point You have several values clumped up near 10,000. That alone does not seem appropriate for any distribution that in principle is unbounded above. How were these numbers calculated? In addition, some scrutiny of your quantiles and a few quick calculations suggest that your distribution is a fair way from lognormal. It is not skewed or long-tailed enough given its other parameters. I haven't tried the other distributions named, but I suspect a similar story. (I can't tell how much difference pweights make to this, but I guess not much.) Nick On Wed, Jul 18, 2012 at 6:23 PM, Lucia R.Latino <[email protected]> wrote: > Dear Nick, > > Thanks for your answer. Here you have the details of my variable. I > hope it can be more useful to give me some feedback. > > -su dec_ae, d - > -------------------------------------------------------------------- > Percentiles Smallest > |---------------------------------------| > 1% 838.9864 11.1115 > 5% 1402.251 38.77081 > 10% 1733.309 112.4597 Obs 11183 > 25% 2352.013 116.3163 Sum of Wgt. 11183 > > 50% 3209.503 Mean 3518.48 > Largest Std. Dev. > 1648.996 > 75% 4355.16 9948.422 > 90% 5793.742 9952.207 Variance 2719189 > 95% 6790.232 9981.6 Skewness 1.017932 > 99% 8768.935 9992.487 Kurtosis 4.138746 > ------------------------------------------------------------------ > > Thanks, > Lucia > > > > -----Messaggio originale----- > Da: [email protected] > [mailto:[email protected]] Per conto di Nick Cox > Inviato: mercoledì 18 luglio 2012 18:49 > A: [email protected] > Oggetto: Re: st: q-q plots, theoretical distribution with values > higher than the sample's cutoff point > > These programs are in package -qpfit- on SSC. > > The word "problem" here is ambiguous. My bias is to guess that your > data don't follow any of these distributions very well and the graphs > are telling you that. -su dec_ae, detail- would tell us a bit more. > > Nick > > On Wed, Jul 18, 2012 at 5:13 PM, Lucia Latino > <[email protected]> > wrote: > >> I am having some problems with the q-q plots for Dagum, gb2, >> lognormal and Singh-Maddala distributions using programs written by Nick Cox. >> >> After having fit the distribution (e.g. lognfit dec_ae, svy), I run >> the command for the q-q plot (e.g. qlogn dec_ae [pweight=iwght]). >> >> I repeat the same procedure for the other distributions (Dagum, gb2 >> and Singh-Maddala). All the plots show a strange behavior: in all the >> qq-plots, the points follow a strongly nonlinear patters. At the >> beginning they follow the 45 degree line, then they depart >> significantly from the 45 degree line and become flat around the >> value > 10,000, which is the max value for dec_ae. >> >> What does it mean? Why the theoretical distribution takes value >> higher than 10,000? >> >> I hope I was clear enough. I wish I could show you the plots, but I >> understood I cannot attach them. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*David Hoaglin <[email protected]>

**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Maarten Buis <[email protected]>

**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Nick Cox <[email protected]>

**References**:**st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Lucia Latino <[email protected]>

**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Nick Cox <[email protected]>

**R: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*"Lucia R.Latino" <[email protected]>

**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point***From:*Nick Cox <[email protected]>

- Prev by Date:
**Re: st: modifying egen to add a replace feature** - Next by Date:
**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point** - Previous by thread:
**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point** - Next by thread:
**Re: st: q-q plots, theoretical distribution with values higher than the sample's cutoff point** - Index(es):