Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: goodness of fit for t-distribution

From   "Nick Cox" <>
To   <>
Subject   st: RE: goodness of fit for t-distribution
Date   Thu, 9 Sep 2004 16:30:12 +0100

I doubt very much that these calculations have been 
previously programmed anywhere using Stata. In practice, people 
don't often seem to fit t distributions to _data_, 
even though a common refrain is that real data are 
commonly fatter-tailed than the Gaussian. A good 
reason for that would be that real data are often 
skewed too. 

Be that as it may, one answer is: consider a graph 
rather than a test. 

Let's suppose that you think that your data look 
like some t distribution with some particular df. How 
you got that df I don't know. But knowing the df is 
necessary, unless you have some smart way of estimating 
it from the data or some smart theory that tells you 
what the df should be. (Some education theory tells us that
Students should be allowed to behave with infinite degrees
of freedom, but is that a normal expectation?) [Warning: 
some wordplay there.] 

By far the best tool is then going to be a dedicated plot 
that lets you look at e.g. the quantiles of a t distribution 
and your data. Official Stata has -qnorm-, -qchi-, ... 
and stops there. But all quantile-quantile plots are the 
same, really. So you can make a -qt- just by hacking at 
e.g. -qnorm-. Here's mine. I didn't bother with implementing 
a -grid-. 

*! 1.0.0  NJC 9 Sept 2004 
program qt, sort
	version 8 
	syntax varname [if] [in] , df(numlist int >0) [ * ] 

	_get_gropts , graphopts(`options') getallowed(rlopts plot)
	local options `"`s(graphopts)'"'
	local rlopts `"`s(rlopts)'"'
	local plot `"`s(plot)'"'
	_check4gropts rlopts, opt(`rlopts')

	tempvar touse Z Psubi
	quietly {
		gen byte `touse' = !missing(`varlist') `if' `in'
		sort `varlist'
		gen float `Psubi' = sum(`touse')
		replace `Psubi' = cond(`touse'==.,.,`Psubi'/(`Psubi'[_N]+1))
		sum `varlist' if `touse'==1, detail
		gen float `Z' = invttail(`df', 1 - `Psubi') *r(sd) + r(mean)
		label var `Z' "Inverse t, `df' df"
		local xttl : var label `Z'
		local fmt : format `varlist'
		format `fmt' `Z'
	local yttl : var label `varlist'
	if `"`yttl'"' == "" {
		local yttl `varlist'
	if `"`plot'"' == "" {
		local legend legend(nodraw)
	version 8: graph twoway			///
	(scatter `varlist' `Z',			///
		sort				///
		ytitle(`"`yttl'"')		///
		xtitle(`"`xttl'"')		///
		`legend'			///
		ylabels(, nogrid)		///
		xlabels(, nogrid)		///
		`yl'				///
		`xl'				///
		note(`"`note'"')		///
		`options'			///
	)					///
	(function y=x,	 	 		///
		range(`Z')			///
		n(2)				///
		clstyle(refline)		///
		yvarlabel("Reference")		///
		yvarformat(`fmt')		///
		`rlopts'			///
	)					///
	|| `plot'				///
	// blank

Then you can 

(1) qt myvar, df(#) 

(2) derive a portfolio of plots to get 
an idea of what samples from genuine t distributions 
should look like using -rndt- from the Hilbe and friend -rnd- 

sysuse auto 

forval i = 1/20 { 
	qui rndt 74 10 
	qt xt, df(10) t1(simulation `i') 

There is, admittedly, still the option of using some 
classical goodness-of-fit test like chi-square 
or Kolmogorov-Smirnov, but you need to do most of the 
calculations yourself, as you fear. 


Michael Stobernack

> if I need a test for normality (for one variable) I use
> sktest, swilk or sfrancia. But what if I need a test for 
> t-distribution?
> There is a pull down menue:
> Statistics/Summaries, tables, & tests/Nonparametric tests of 
> hypotheses/
> One-sample Kolmogorov-Smirnov test/Expression
> But I don't know what to put in the expression field.
> Is there any ado-file to run a goodness of fit test for t-distribution
> without knowing the formula for the cdf of t-distribution?

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index