Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Goodness of fit tests for continuous data using Stata

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: Goodness of fit tests for continuous data using Stata
Date	Tue, 5 Oct 2010 13:00:33 +0100

I support Maarten's general stance here. In addition:

1. Even the specification "uniform" leaves open whether the distribution is discrete or continuous, and whether the limits are known in advance or must be estimated from the data. Appropriate tests differ. 

2. For this specific problem, note that -qplot- from SJ is in effect a superset of -quantile- and has the ability to show two or more sets of data simultaneously (which I don't think is true of -hangroot-). 

3. Support for fitting other distributions in Stata is generally specific to the distribution. A simple trick for several distributions is to pretend that data are survival times and use -streg-. Also, for example, see -qexp-, -pexp-, -weibullfit-, -pweibull-, -qweibull- on SSC. There is no guarantee that any of the commands uses your preferred parameterisation.... 

4. With a graphical approach it is often helpful to generate a portfolio of graphs showing random samples with the same size as yours from the specified distribution. This gives a kind of informal significance testing. 

Nick 
[email protected] 

Maarten buis

--- On Mon, 4/10/10, Earley, Joseph wrote:
> Does Stata have a module which allows for testing whether
> or not a variable follows distributions such as the
> uniform, exponential, weibull etc.
> 
> In particular,  I would like to test whether or not a
> variable follows a uniform probability distribution using
> Stata.

Tests exist, but they are not very powerful. So you are 
unlikely to detect deviations from your theoretical
distribution when you should. This is a limitation of
statistics, not of Stata. 

The prefered method is not to test but to graph. Two 
graphs can be particularly useful here: Firtst, the 
hanging rootogram as implemented in -hangroot- as it 
allows you to include confidence intervals. That way 
you can still have something resembling a test. Second,
the quantile plot as implemented in -quantile-. This
gives you a very direct direct view on the data. This
can for example be useful for spotting ties, which 
are often the reason for deviation from a uniform 
distribution.

-hangroot- is a user writen program, and can be downloaded
by typing in Stata -ssc instal hangroot-. -quantile- is
part of official Stata. I like to use the -aspect(1)-
option for -quantile- as the logic of this graph is that 
the observations should lie on the 45 degree line. By
forcing the aspect ratio of the graph to be 1, the 45 
degree line is really a 45 degree line. Leaving this option
out is not wrong, but I think adding it leads to a visually
clearer picture.

As I said before, we can do a test, but this test is not 
very powerful. In Stata we use the -ksmirnov- command for 
that. For that we need the cumulative distribution function 
(CDF) of our theoretical distribution. The CDF of a 
uniformly distributed variable is (x - a)/(b - a) if it 
ranges between a and b. In the example below we test 
percentile rank scores of the variable, so a = 0 and b = 1, 
and the CDF of x is x.

*------------------- begin example -------------------
sysuse auto, clear

// create percentile rank score of mpg
// this should be uniformly distributed 
// unless there are too severe ties
egen n = count(mpg)
egen i = rank(mpg)
gen hazen = (i - 0.5) / n
drop n i

// a suspended rootogram with confidence intervals
hangroot hazen , dist(uniform) susp notheor ci ///
                 name(hangroot, replace)

// a quantile plot, no confidence interval but 
// better for spotting ties
quantile hazen , aspect(1) name(quantile, replace)

// test, but not very powerful
ksmirnov hazen = hazen
*------------------ end example -----------------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Goodness of fit tests for continuous data using Stata
  - From: "Earley, Joseph" <[email protected]>
- Re: st: Goodness of fit tests for continuous data using Stata
  - From: Maarten buis <[email protected]>

Prev by Date: Re: st: Wilcoxon signed-rank test for clustered data
Next by Date: Re: st: ST graphs with CI's
Previous by thread: Re: st: Goodness of fit tests for continuous data using Stata
Next by thread: st: Matching
Index(es):
- Date
- Thread