Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Goodness of fit tests for continuous data using Stata |

Date |
Tue, 5 Oct 2010 13:00:33 +0100 |

I support Maarten's general stance here. In addition: 1. Even the specification "uniform" leaves open whether the distribution is discrete or continuous, and whether the limits are known in advance or must be estimated from the data. Appropriate tests differ. 2. For this specific problem, note that -qplot- from SJ is in effect a superset of -quantile- and has the ability to show two or more sets of data simultaneously (which I don't think is true of -hangroot-). 3. Support for fitting other distributions in Stata is generally specific to the distribution. A simple trick for several distributions is to pretend that data are survival times and use -streg-. Also, for example, see -qexp-, -pexp-, -weibullfit-, -pweibull-, -qweibull- on SSC. There is no guarantee that any of the commands uses your preferred parameterisation.... 4. With a graphical approach it is often helpful to generate a portfolio of graphs showing random samples with the same size as yours from the specified distribution. This gives a kind of informal significance testing. Nick n.j.cox@durham.ac.uk Maarten buis --- On Mon, 4/10/10, Earley, Joseph wrote: > Does Stata have a module which allows for testing whether > or not a variable follows distributions such as the > uniform, exponential, weibull etc. > > In particular, I would like to test whether or not a > variable follows a uniform probability distribution using > Stata. Tests exist, but they are not very powerful. So you are unlikely to detect deviations from your theoretical distribution when you should. This is a limitation of statistics, not of Stata. The prefered method is not to test but to graph. Two graphs can be particularly useful here: Firtst, the hanging rootogram as implemented in -hangroot- as it allows you to include confidence intervals. That way you can still have something resembling a test. Second, the quantile plot as implemented in -quantile-. This gives you a very direct direct view on the data. This can for example be useful for spotting ties, which are often the reason for deviation from a uniform distribution. -hangroot- is a user writen program, and can be downloaded by typing in Stata -ssc instal hangroot-. -quantile- is part of official Stata. I like to use the -aspect(1)- option for -quantile- as the logic of this graph is that the observations should lie on the 45 degree line. By forcing the aspect ratio of the graph to be 1, the 45 degree line is really a 45 degree line. Leaving this option out is not wrong, but I think adding it leads to a visually clearer picture. As I said before, we can do a test, but this test is not very powerful. In Stata we use the -ksmirnov- command for that. For that we need the cumulative distribution function (CDF) of our theoretical distribution. The CDF of a uniformly distributed variable is (x - a)/(b - a) if it ranges between a and b. In the example below we test percentile rank scores of the variable, so a = 0 and b = 1, and the CDF of x is x. *------------------- begin example ------------------- sysuse auto, clear // create percentile rank score of mpg // this should be uniformly distributed // unless there are too severe ties egen n = count(mpg) egen i = rank(mpg) gen hazen = (i - 0.5) / n drop n i // a suspended rootogram with confidence intervals hangroot hazen , dist(uniform) susp notheor ci /// name(hangroot, replace) // a quantile plot, no confidence interval but // better for spotting ties quantile hazen , aspect(1) name(quantile, replace) // test, but not very powerful ksmirnov hazen = hazen *------------------ end example ----------------------- (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq ) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Goodness of fit tests for continuous data using Stata***From:*"Earley, Joseph" <Joseph.Earley@lmu.edu>

**Re: st: Goodness of fit tests for continuous data using Stata***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**Re: st: Wilcoxon signed-rank test for clustered data** - Next by Date:
**Re: st: ST graphs with CI's** - Previous by thread:
**Re: st: Goodness of fit tests for continuous data using Stata** - Next by thread:
**st: Matching** - Index(es):