  # st: RE: goodness of fit for t-distribution

 From "Nick Cox" To Subject st: RE: goodness of fit for t-distribution Date Thu, 9 Sep 2004 16:30:12 +0100

```I doubt very much that these calculations have been
previously programmed anywhere using Stata. In practice, people
don't often seem to fit t distributions to _data_,
even though a common refrain is that real data are
commonly fatter-tailed than the Gaussian. A good
reason for that would be that real data are often
skewed too.

Be that as it may, one answer is: consider a graph
rather than a test.

Let's suppose that you think that your data look
like some t distribution with some particular df. How
you got that df I don't know. But knowing the df is
necessary, unless you have some smart way of estimating
it from the data or some smart theory that tells you
what the df should be. (Some education theory tells us that
Students should be allowed to behave with infinite degrees
of freedom, but is that a normal expectation?) [Warning:
some wordplay there.]

By far the best tool is then going to be a dedicated plot
that lets you look at e.g. the quantiles of a t distribution
and your data. Official Stata has -qnorm-, -qchi-, ...
and stops there. But all quantile-quantile plots are the
same, really. So you can make a -qt- just by hacking at
e.g. -qnorm-. Here's mine. I didn't bother with implementing
a -grid-.

*! 1.0.0  NJC 9 Sept 2004
program qt, sort
version 8
syntax varname [if] [in] , df(numlist int >0) [ * ]

_get_gropts , graphopts(`options') getallowed(rlopts plot)
local options `"`s(graphopts)'"'
local rlopts `"`s(rlopts)'"'
local plot `"`s(plot)'"'
_check4gropts rlopts, opt(`rlopts')

tempvar touse Z Psubi
quietly {
gen byte `touse' = !missing(`varlist') `if' `in'
sort `varlist'
gen float `Psubi' = sum(`touse')
replace `Psubi' = cond(`touse'==.,.,`Psubi'/(`Psubi'[_N]+1))
sum `varlist' if `touse'==1, detail
gen float `Z' = invttail(`df', 1 - `Psubi') *r(sd) + r(mean)
label var `Z' "Inverse t, `df' df"
local xttl : var label `Z'
local fmt : format `varlist'
format `fmt' `Z'
}

local yttl : var label `varlist'
if `"`yttl'"' == "" {
local yttl `varlist'
}
if `"`plot'"' == "" {
local legend legend(nodraw)
}
version 8: graph twoway			///
(scatter `varlist' `Z',			///
sort				///
ytitle(`"`yttl'"')		///
xtitle(`"`xttl'"')		///
`legend'			///
ylabels(, nogrid)		///
xlabels(, nogrid)		///
`yl'				///
`xl'				///
note(`"`note'"')		///
`options'			///
)					///
(function y=x,	 	 		///
range(`Z')			///
n(2)				///
clstyle(refline)		///
yvarlabel("Reference")		///
yvarformat(`fmt')		///
`rlopts'			///
)					///
|| `plot'				///
// blank
end

Then you can

(1) qt myvar, df(#)

(2) derive a portfolio of plots to get
an idea of what samples from genuine t distributions
should look like using -rndt- from the Hilbe and friend -rnd-
package:

sysuse auto

forval i = 1/20 {
qui rndt 74 10
qt xt, df(10) t1(simulation `i')
more
}

There is, admittedly, still the option of using some
classical goodness-of-fit test like chi-square
or Kolmogorov-Smirnov, but you need to do most of the
calculations yourself, as you fear.

Nick
n.j.cox@durham.ac.uk

Michael Stobernack

> if I need a test for normality (for one variable) I use
> sktest, swilk or sfrancia. But what if I need a test for
> t-distribution?
> There is a pull down menue:
> Statistics/Summaries, tables, & tests/Nonparametric tests of
> hypotheses/
> One-sample Kolmogorov-Smirnov test/Expression
>
> But I don't know what to put in the expression field.
> Is there any ado-file to run a goodness of fit test for t-distribution
> without knowing the formula for the cdf of t-distribution?
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```