Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Goodness of fit tests for continuous data using Stata


From   Maarten buis <[email protected]>
To   [email protected]
Subject   Re: st: Goodness of fit tests for continuous data using Stata
Date   Tue, 5 Oct 2010 07:51:47 +0100 (BST)

--- On Mon, 4/10/10, Earley, Joseph wrote:
> Does Stata have a module which allows for testing whether
> or not a variable follows distributions such as the
> uniform, exponential, weibull etc.
> 
> In particular,  I would like to test whether or not a
> variable follows a uniform probability distribution using
> Stata.

Tests exist, but they are not very powerful. So you are 
unlikely to detect deviations from your theoretical
distribution when you should. This is a limitation of
statistics, not of Stata. 

The prefered method is not to test but to graph. Two 
graphs can be particularly useful here: Firtst, the 
hanging rootogram as implemented in -hangroot- as it 
allows you to include confidence intervals. That way 
you can still have something resembling a test. Second,
the quantile plot as implemented in -quantile-. This
gives you a very direct direct view on the data. This
can for example be useful for spotting ties, which 
are often the reason for deviation from a uniform 
distribution.

-hangroot- is a user writen program, and can be downloaded
by typing in Stata -ssc instal hangroot-. -quantile- is
part of official Stata. I like to use the -aspect(1)-
option for -quantile- as the logic of this graph is that 
the observations should lie on the 45 degree line. By
forcing the aspect ratio of the graph to be 1, the 45 
degree line is really a 45 degree line. Leaving this option
out is not wrong, but I think adding it leads to a visually
clearer picture.

As I said before, we can do a test, but this test is not 
very powerful. In Stata we use the -ksmirnov- command for 
that. For that we need the cumulative distribution function 
(CDF) of our theoretical distribution. The CDF of a 
uniformly distributed variable is (x - a)/(b - a) if it 
ranges between a and b. In the example below we test 
percentile rank scores of the variable, so a = 0 and b = 1, 
and the CDF of x is x.

*------------------- begin example -------------------
sysuse auto, clear

// create percentile rank score of mpg
// this should be uniformly distributed 
// unless there are too severe ties
egen n = count(mpg)
egen i = rank(mpg)
gen hazen = (i - 0.5) / n
drop n i

// a suspended rootogram with confidence intervals
hangroot hazen , dist(uniform) susp notheor ci ///
                 name(hangroot, replace)

// a quantile plot, no confidence interval but 
// better for spotting ties
quantile hazen , aspect(1) name(quantile, replace)

// test, but not very powerful
ksmirnov hazen = hazen
*------------------ end example -----------------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index