**[R] hausman** -- Hausman specification test

__Syntax__

**hausman** *name-consistent* [*name-efficient*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
Main
__c__**onstant** include estimated intercepts in comparison;
default is to exclude
__a__**lleqs** use all equations to perform test; default is
first equation only
__sk__**ipeqs(***eqlist***)** skip specified equations when performing test
__eq__**uations(***matchlist***)** associate/compare the specified (by number)
pairs of equations
**force** force performance of test, even though
assumptions are not met
**df(***#***)** use *#* degrees of freedom
__sig__**mamore** base both (co)variance matrices on disturbance
variance estimate from efficient estimator
__sigmal__**ess** base both (co)variance matrices on disturbance
variance estimate from consistent estimator

Advanced
__tcon__**sistent(***string***)** consistent estimator column header
__teff__**icient(***string***)** efficient estimator column header
-------------------------------------------------------------------------

where *name-consistent* and *name-efficient* are names under which estimation
results were stored via **estimates store**.
A period (**.**) may be used to refer to the last estimation results, even if
these were not already stored.
Not specifying *name-efficient* is equivalent to specifying the last
estimation results as "**.**".

__Menu__

**Statistics > Postestimation**

__Description__

**hausman** performs Hausman's specification test.

__Options__

+------+
----+ Main +-------------------------------------------------------------

**constant** specifies that the estimated intercept(s) be included in the
model comparison; by default, they are excluded. The default
behavior is appropriate for models in which the constant does not
have a common interpretation across the two models.

**alleqs** specifies that all the equations in the models be used to perform
the Hausman test; by default, only the first equation is used.

**skipeqs(***eqlist***)** specifies in *eqlist* the names of equations to be excluded
from the test. Equation numbers are not allowed in this context,
because the equation names, along with the variable names, are used
to identify common coefficients.

**equations(***matchlist***)** specifies, by number, the pairs of equations that
are to be compared.

The *matchlist* in **equations()** should follow the syntax

*#c***:***#e* [**,***#c***:***#e*[**,** *...*]]

where *#c*(*#e*) is an equation number of the always-consistent
(efficient under H0) estimator. For instance **equations(1:1)**,
**equations(1:1, 2:2)**, or **equations(1:2)**.

If **equations()** is not specified, then equations are matched on
equation names.

**equations()** handles the situation in which one estimator uses
equation names and the other does not. For instance, **equations(1:2)**
means that equation 1 of the always-consistent estimator is to be
tested against equation 2 of the efficient estimator. **equations(1:1,**
**2:2)** means that equation 1 is to be tested against equation 1 and
that equation 2 is to be tested against equation 2. If **equations()**
is specified, the **alleqs** and **skipeqs** options are ignored.

**force** specifies that the Hausman test be performed, even though the
assumptions of the Hausman test seem not to be met, for example,
because the estimators were **pweight**ed or the data were clustered.

**df(***#***)** specifies the degrees of freedom for the Hausman test. The default
is the matrix rank of the variance of the difference between the
coefficients of the two estimators.

**sigmamore** and **sigmaless** specify that the two covariance matrices used in
the test be based on a common estimate of disturbance variance
(sigma2).

**sigmamore** specifies that the covariance matrices be based on the
estimated disturbance variance from the efficient estimator.
This option provides a proper estimate of the contrast variance
for so-called tests of exogeneity and overidentification in
instrumental-variables regression.

**sigmaless** specifies that the covariance matrices be based on the
estimated disturbance variance from the consistent estimator.

These options can be specified only when both estimators store
**e(sigma)** or **e(rmse)**, or with the **xtreg** command. **e(sigma_e)** is stored
after the **xtreg** command with the **fe** or **mle** option. **e(rmse)** is stored
after the **xtreg** command with the **re** option.

**sigmamore** or **sigmaless** are recommended when comparing fixed-effects
and random-effects linear regression because they are much less
likely to produce a non-positive-definite-differenced covariance
matrix (although the tests are asymptotically equivalent whether or
not one of the options is specified).

+----------+
----+ Advanced +---------------------------------------------------------

**tconsistent(***string***)** and **tefficient(***string***)** are formatting options. They
allow you to specify the headers of the columns of coefficients that
default to the names of the models. These options will be of
interest primarily to programmers.

__Remarks__

The assumption that one of the estimators is efficient (that is, has
minimal asymptotic variance) is a demanding one. It is violated, for
instance, if your observations are clustered or pweighted, or if your
model is somehow misspecified. Moreover, even if the assumption is
satisfied, there may be a "small sample" problem with the Hausman test.
Hausman's test is based on estimating the variance var(b-B) of the
difference of the estimators by the difference var(b)-var(B) of the
variances. Under the assumptions (1) and (3), var(b)-var(B) is a
consistent estimator of var(b-B), but it is not necessarily positive
definite "in finite samples", that is, in your application. If this is
the case, the Hausman test is undefined. Unfortunately, this is not a
rare event. Stata supports a generalized Hausman test that overcomes
both of these problems. See **[R] suest** for details.

To use **hausman**, perform the following steps.

(1) obtain an estimator that is consistent whether or not the
hypothesis is true;
(2) store the estimation results under *name-consistent* by using
**estimates store**;
(3) obtain an estimator that is **efficient** (and **consistent**) under the
hypothesis that you are testing, but **inconsistent** otherwise;
(4) store the estimation results under *name-efficient* by using
**estimates store**;
(5) use **hausman** to perform the test

**hausman** *name-consistent* *name-efficient* [**,** *options*]

The order of computing the two estimators may be reversed. You have to be
careful, though, to specify to **hausman** the models in the order "always
consistent" first and "efficient under H0" second. It is possible to skip
storing the second model and refer to the last estimation results by a
period (**.**).

**hausman** may be used in any context. The order in which you specify the
regressors in each model does not matter, but you must ensure that the
estimators and models are comparable and that they satisfy the
theoretical conditions (see (1) and (3) above).

__Examples__

---------------------------------------------------------------------------
Setup
**. webuse nlswork4**
**. xtreg ln_wage age msp ttl_exp, fe**
**. estimates store fixed**
**. xtreg ln_wage age msp ttl_exp, re**

Test the appropriateness of the random-effects estimator (**xtreg, re**)
**. hausman fixed ., sigmamore**

---------------------------------------------------------------------------
Setup
**. webuse sysdsn3**
**. mlogit insure age male**
**. estimates store all**
**. mlogit insure age male if insure != "Uninsure":insure**
**. estimates store partial**

Perform Hausman test for independence of irrelevant alternatives
**. hausman partial all, alleqs constant**

---------------------------------------------------------------------------
Setup
**. sysuse auto**
**. regress mpg price**
**. estimates store reg**
**. heckman mpg price, select(foreign=weight)**

Specify **equations()** option to force comparison when one estimator uses
equation names and the other does not
**. hausman reg ., equation(1:1)**

Setup
**. probit foreign weight**
**. estimates store probit_for**
**. heckman mpg price, select(foreign=weight)**

Compare probit model and selection equation of heckman model
**. hausman probit_for ., equation(1:2)**

---------------------------------------------------------------------------

__Stored results__

**hausman** stores the following in **r()**:

Scalars
**r(chi2)** chi-squared
**r(df)** degrees of freedom for the statistic
**r(p)** p-value for the chi-squared
**r(rank)** rank of **(V_b-V_B)^(-1)**