Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten buis <[email protected]> |

To |
[email protected] |

Subject |
Re: st: Goodness of fit tests for continuous data using Stata |

Date |
Tue, 5 Oct 2010 07:51:47 +0100 (BST) |

--- On Mon, 4/10/10, Earley, Joseph wrote: > Does Stata have a module which allows for testing whether > or not a variable follows distributions such as the > uniform, exponential, weibull etc. > > In particular, I would like to test whether or not a > variable follows a uniform probability distribution using > Stata. Tests exist, but they are not very powerful. So you are unlikely to detect deviations from your theoretical distribution when you should. This is a limitation of statistics, not of Stata. The prefered method is not to test but to graph. Two graphs can be particularly useful here: Firtst, the hanging rootogram as implemented in -hangroot- as it allows you to include confidence intervals. That way you can still have something resembling a test. Second, the quantile plot as implemented in -quantile-. This gives you a very direct direct view on the data. This can for example be useful for spotting ties, which are often the reason for deviation from a uniform distribution. -hangroot- is a user writen program, and can be downloaded by typing in Stata -ssc instal hangroot-. -quantile- is part of official Stata. I like to use the -aspect(1)- option for -quantile- as the logic of this graph is that the observations should lie on the 45 degree line. By forcing the aspect ratio of the graph to be 1, the 45 degree line is really a 45 degree line. Leaving this option out is not wrong, but I think adding it leads to a visually clearer picture. As I said before, we can do a test, but this test is not very powerful. In Stata we use the -ksmirnov- command for that. For that we need the cumulative distribution function (CDF) of our theoretical distribution. The CDF of a uniformly distributed variable is (x - a)/(b - a) if it ranges between a and b. In the example below we test percentile rank scores of the variable, so a = 0 and b = 1, and the CDF of x is x. *------------------- begin example ------------------- sysuse auto, clear // create percentile rank score of mpg // this should be uniformly distributed // unless there are too severe ties egen n = count(mpg) egen i = rank(mpg) gen hazen = (i - 0.5) / n drop n i // a suspended rootogram with confidence intervals hangroot hazen , dist(uniform) susp notheor ci /// name(hangroot, replace) // a quantile plot, no confidence interval but // better for spotting ties quantile hazen , aspect(1) name(quantile, replace) // test, but not very powerful ksmirnov hazen = hazen *------------------ end example ----------------------- (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq ) Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:

**References**:**st: Goodness of fit tests for continuous data using Stata***From:*"Earley, Joseph" <[email protected]>

- Prev by Date:
**st: Two stage model question** - Next by Date:
**Re: st: Updating Stata on a Unix box without logging in as root** - Previous by thread:
**st: Goodness of fit tests for continuous data using Stata** - Next by thread:
**RE: st: Goodness of fit tests for continuous data using Stata** - Index(es):