# RE: st: p-value after cluster option

 From "Maarten Buis" To Subject RE: st: p-value after cluster option Date Tue, 29 Aug 2006 09:39:01 +0200

```Sara:
Rich appears to be right, the problem does indeed appear to be the
degrees of freedom. The p-values are calculated using a
t-distribution not a normal distribution, whereby the number of
degrees of freedom are N - number of variables - 1. However if the
number of degrees of freedom is large the t is well approximated
by a normal, and 1627 degrees of freedom is more than enough to
make this approximation work extremely well.  However, with the
cluster option the degrees of freedom are determined by the number
of clusters (since these are now the number of independent bits of
information, not the number observations). So if you have few
clusters the normal approximation may no longer work. Stata stores
the appropriate degrees of freedom in e(df_r), so you can use that
to recreate the p-values, like in the example below:

HTH,
Maarten

*-----------------begin example----------------
sysuse auto, clear
reg price mpg foreign, cluster(rep78)
di "t = " abs(_b[mpg]/_se[mpg])
di "df = " e(df_r)
di "p = " 2*ttail(e(df_r),abs(_b[mpg]/_se[mpg]))
di "p is not " 2*norm(-abs(_b[mpg]/_se[mpg]))
*-----------------end example-------------------

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

---sara borelli wrote:
> in my regression I have N=1647 and  have 19 variables.
> To calculate the P-values I used the tables of the
> standard normal distribution as in the standard way to
> calculate the P-values.

--- Richard Goldstein wrote:
> > sounds like a problem with you're getting the
> > degrees of freedom correct

--- sara borelli wrote:
> > > But when I use cluster, the p-values look "wrong",
> > > that is if I calculate the p-value using the
> > > "new-clustered" t-statistics and the statistical
> > > tables I get a result that is different from the
> > > stata output

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```