**[R] regress postestimation** -- Postestimation tools for regress

__Postestimation commands__

The following postestimation commands are of special interest after
**regress**:

Command Description
-------------------------------------------------------------------------
**dfbeta** DFBETA influence statistics
**estat hettest** tests for heteroskedasticity
**estat imtest** information matrix test
**estat ovtest** Ramsey regression specification-error test for omitted
variables
**estat szroeter** Szroeter's rank test for heteroskedasticity
**estat vif** variance inflation factors for the independent
variables
**estat esize** eta-squared, epsilon-squared, and omega-squared effect
sizes
**estat moran** Moran test of residual correlation with nearby
residuals
-------------------------------------------------------------------------
These commands are not appropriate after the **svy** prefix.

The following standard postestimation commands are also available:

Command Description
-------------------------------------------------------------------------
**contrast** contrasts and ANOVA-style joint tests of estimates
**estat ic** Akaike's and Schwarz's Bayesian information criteria
(AIC and BIC)
**estat summarize** summary statistics for the estimation sample
**estat vce** variance-covariance matrix of the estimators (VCE)
**estat** (svy) postestimation statistics for survey data
**estimates** cataloging estimation results
* **forecast** dynamic forecasts and simulations
* **hausman** Hausman's specification test
**lincom** point estimates, standard errors, testing, and
inference for linear combinations of coefficients
**linktest** link test for model specification
* **lrtest** likelihood-ratio test
**margins** marginal means, predictive margins, marginal effects,
and average marginal effects
**marginsplot** graph the results from margins (profile plots,
interaction plots, etc.)
**nlcom** point estimates, standard errors, testing, and
inference for nonlinear combinations of coefficients
**predict** predictions, residuals, influence statistics, and
other diagnostic measures
**predictnl** point estimates, standard errors, testing, and
inference for generalized predictions
**pwcompare** pairwise comparisons of estimates
**suest** seemingly unrelated estimation
**test** Wald tests of simple and composite linear hypotheses
**testnl** Wald tests of nonlinear hypotheses
-------------------------------------------------------------------------
* **forecast**, **hausman**, and **lrtest** are not appropriate with **svy** estimation
results. **forecast** is also not appropriate with **mi** estimation results.

__Predictions__

__Syntax for predict__

**predict** [*type*] *newvar* [*if*] [*in*] [**,** *statistic*]

*statistic* Description
-------------------------------------------------------------------------
Main
**xb** linear prediction; the default
__r__**esiduals** residuals
__sc__**ore** score; equivalent to **residuals**
__rsta__**ndard** standardized residuals
__rstu__**dent** Studentized (jackknifed) residuals
__c__**ooksd** Cook's distance
__l__**everage** | __h__**at** leverage (diagonal elements of hat matrix)
__p__**r(***a***,***b***)** Pr(y | *a* < y < *b*)
**e(***a***,***b***)** *E*(y | *a* < y < *b*)
__ys__**tar(***a***,***b***)** *E*(y*), y* = max**(***a*,min(y,*b*)**)**
* __dfb__**eta(***varname***)** DFBETA for *varname*
**stdp** standard error of the linear prediction
**stdf** standard error of the forecast
**stdr** standard error of the residual
* __cov__**ratio** COVRATIO
* __dfi__**ts** DFITS
* __w__**elsch** Welsch distance
-------------------------------------------------------------------------
Unstarred statistics are available both in and out of sample; **type**
**predict ... if e(sample) ...** if wanted only for the estimation sample.
Starred statistics are calculated only for the estimation sample, even
when **if** **e(sample)** is not specified.
**rstandard**, **rstudent**, **cooksd**, **leverage**, **dfbeta()**, **stdf**, **stdr**, **covratio**,
**dfits**, and **welsch** are not available if any **vce()** other than **vce(ols)**
was specified with **regress**.
**xb**, **residuals**, **score**, and **stdp** are the only options allowed with **svy**
estimation results.

where *a* and *b* may be numbers or variables; *a* missing (*a* __>__ **.**) means minus
infinity, and *b* missing (*b* __>__ **.**) means plus infinity; see missing.

__Menu for predict__

**Statistics > Postestimation**

__Description for predict__

**predict** creates a new variable containing predictions such as linear
predictions, residuals, standardized residuals, Studentized residuals,
Cook's distance, leverage, probabilities, expected values, DFBETAs for
*varname*, standard errors, COVRATIOs, DFITS, and Welsch distances.

__Options for predict__

+------+
----+ Main +-------------------------------------------------------------

**xb**, the default, calculates the linear prediction.

**residuals** calculates the residuals.

**score** is equivalent to **residuals** in linear regression.

**rstandard** calculates the standardized residuals.

**rstudent** calculates the Studentized (jackknifed) residuals.

**cooksd** calculates the Cook's D influence statistic (Cook 1977).

**leverage** or **hat** calculates the diagonal elements of the projection
("hat") matrix.

**pr(***a***,***b***)** calculates Pr(*a* < xb + u < *b*), the probability that y|x would be
observed in the interval (*a*,*b*).

*a* and *b* may be specified as numbers or variable names; *lb* and *ub* are
variable names;
**pr(20,30)** calculates Pr(20 < xb + u < 30);
**pr(***lb***,***ub***)** calculates Pr(*lb* < xb + u < *ub*); and
**pr(20,***ub***)** calculates Pr(20 < xb + u < *ub*).

*a* missing (*a* __>__ .) means minus infinity; **pr(.,30)** calculates
Pr(-infinity < xb + u < 30);
**pr(***lb***,30)** calculates Pr(-infinity < xb + u < 30) in observations for
which *lb* __>__ .
and calculates Pr(*lb* < xb + u < 30) elsewhere.

*b* missing (*b* __>__ .) means plus infinity; **pr(20,.)** calculates
Pr(+infinity > xb + u > 20);
**pr(20,***ub***)** calculates Pr(+infinity > xb + u > 20) in observations for
which *ub* __>__ .
and calculates Pr(20 < xb + u < *ub*) elsewhere.

**e(***a***,***b***)** calculates *E*(xb+u | *a* < xb+u < *b*), the expected value of y|x
conditional on y|x being in the interval (*a*,*b*), meaning that y|x is
truncated. *a* and *b* are specified as they are for **pr()**.

**ystar(***a***,***b***)** calculates *E*(y*), where y* = *a* if xb+u __<__ *a*, y* = *b* if
xb+u __>__ *b*, and y* = xb+u otherwise, meaning that y* is censored. *a*
and *b* are specified as they are for **pr()**.

**dfbeta(***varname***)** calculates the DFBETA for *varname*, the difference between
the regression coefficient when the jth observation is included and
excluded, said difference being scaled by the estimated standard
error of the coefficient. *varname* must have been included among the
regressors in the previously fitted model. The calculation is
automatically restricted to the estimation subsample.

**stdp** calculates the standard error of the prediction, which can be
thought of as the standard error of the predicted expected value or
mean for the observation's covariate pattern. The standard error of
the prediction is also referred to as the standard error of the
fitted value.

**stdf** calculates the standard error of the forecast, which is the standard
error of the point prediction for 1 observation. It is commonly
referred to as the standard error of the future or forecast value.
By construction, the standard errors produced by **stdf** are always
larger than those produced by **stdp**; see *Methods and formulas* in **[R]**
**regress postestimation**.

**stdr** calculates the standard error of the residuals.

**covratio** calculates COVRATIO (Belsley, Kuh, and Welsch 1980), a measure
of the influence of the jth observation based on considering the
effect on the variance-covariance matrix of the estimates. The
calculation is automatically restricted to the estimation subsample.

**dfits** calculates DFITS (Welsch and Kuh 1977) and attempts to summarize
the information in the leverage versus residual-squared plot into one
statistic. The calculation is automatically restricted to the
estimation subsample.

**welsch** calculates Welsch distance (Welsch 1982) and is a variation on
**dfits**. The calculation is automatically restricted to the estimation
subsample.

__Margins__

__Syntax for margins__

**margins** [*marginlist*] [**,** *options*]

**margins** [*marginlist*] **,** __pr__**edict(***statistic *...**)** [*options*]

*statistic* Description
-------------------------------------------------------------------------
**xb** linear prediction; the default
__p__**r(***a***,***b***)** not allowed with **margins**
**e(***a***,***b***)** not allowed with **margins**
__ys__**tar(***a***,***b***)** not allowed with **margins**
__r__**esiduals** not allowed with **margins**
__sc__**ore** not allowed with **margins**
__rsta__**ndard** not allowed with **margins**
__rstu__**dent** not allowed with **margins**
__c__**ooksd** not allowed with **margins**
__l__**everage** | __h__**at** not allowed with **margins**
__dfb__**eta(***varname***)** not allowed with **margins**
**stdp** not allowed with **margins**
**stdf** not allowed with **margins**
**stdr** not allowed with **margins**
__cov__**ratio** not allowed with **margins**
__dfi__**ts** not allowed with **margins**
__w__**elsch** not allowed with **margins**
-------------------------------------------------------------------------

Statistics not allowed with **margins** are functions of stochastic
quantities other than **e(b)**.

For the full syntax, see **[R] margins**.

__Menu for margins__

**Statistics > Postestimation**

__Description for margins__

**margins** estimates margins of response for linear predictions.

__DFBETA influence statistics__

__Syntax for dfbeta__

**dfbeta** [*indepvar* [*indepvar* [...]]] [**,** **stub(***name***)**]

__Menu for dfbeta__

**Statistics > Linear models and related > Regression diagnostics >** **DFBETAs**

__Description for dfbeta__

**dfbeta** will calculate one, more than one, or all the DFBETAs after
**regress**. Although **predict** will also calculate DFBETAs, **predict** can do
this for only one variable at a time. **dfbeta** is a convenience tool for
those who want to calculate DFBETAs for multiple variables. The names
for the new variables created are chosen automatically and begin with the
letters **_dfbeta_**.

__Option for dfbeta__

**stub(***name***)** specifies the leading characters **dfbeta** uses to name the new
variables to be generated. The default is **stub(_dfbeta_)**.

__Tests for violation of assumptions__

__Syntax for estat hettest__

**estat** __hett__**est** [*varlist*] [**,** __r__**hs** [__no__**rmal** | __ii__**d** | __fs__**tat**] __m__**test**[**(***spec***)**]]

__Menu for estat__

**Statistics > Postestimation**

__Description for estat hettest__

**estat hettest** performs three versions of the Breusch-Pagan (1979) and
Cook-Weisberg (1983) test for heteroskedasticity. All three versions of
this test present evidence against the null hypothesis that t=0 in
Var(e)=sigma^2 exp(zt). In the **normal** version, performed by default, the
null hypothesis also includes the assumption that the regression
disturbances are independent-normal draws with variance sigma^2. The
normality assumption is dropped from the null hypothesis in the **iid** and
**fstat** versions, which respectively produce the score and F tests
discussed in *Methods and formulas* in **[R] regress postestimation**. If
*varlist* is not specified, the fitted values are used for z. If *varlist*
or the **rhs** option is specified, the variables specified are used for z.

__Options for estat hettest__

**rhs** specifies that tests for heteroskedasticity be performed for the
right-hand-side (explanatory) variables of the fitted regression
model. The **rhs** option may be combined with a *varlist*, in which case
the the variables in *varlist* are included in the model for the
variance along with the explanatory variables.

**normal**, the default, causes **estat hettest** to compute the original
Breusch-Pagan/Cook-Weisberg test, which assumes that the regression
disturbances are normally distributed.

**iid** causes **estat hettest** to compute the N*R2 version of the score test
that drops the normality assumption.

**fstat** causes **estat hettest** to compute the F-statistic version that drops
the normality assumption.

**mtest**[**(***spec***)**] specifies that multiple testing be performed. The argument
specifies how p-values are adjusted. The following specifications,
*spec*, are supported:

__b__**onferroni** Bonferroni's multiple testing adjustment
__h__**olm** Holm's multiple testing adjustment
__s__**idak** Sidak's multiple testing adjustment
__noadj__**ust** no adjustment is made for multiple testing

**mtest** may be specified without an argument. This is equivalent to
specifying **mtest(noadjust)**; that is, tests for the individual
variables should be performed with unadjusted p-values. By default,
**estat hettest** does not perform multiple testing. **mtest** may not be
specified with **iid** or **fstat**.

__Syntax for estat imtest__

**estat** __imt__**est** [**,** __p__**reserve** __wh__**ite**]

__Menu for estat__

**Statistics > Postestimation**

__Description for estat imtest__

**estat imtest** performs an information matrix test for the regression model
and an orthogonal decomposition into tests for heteroskedasticity,
skewness, and kurtosis due to Cameron and Trivedi (1990); White's test
for homoskedasticity against unrestricted forms of heteroskedasticity
(1980) is available as an option. White's test is usually similar to the
first term of the Cameron-Trivedi decomposition.

__Options for estat imtest__

**preserve** specifies that the data in memory be preserved, all variables
and cases that are not needed in the calculations be dropped, and at
the conclusion the original data be restored. This option is costly
for large datasets. However, because **estat imtest** has to perform an
auxiliary regression on k(k+1)/2 temporary variables, where k is the
number of regressors, it may not be able to perform the test
otherwise.

**white** specifies that White's original heteroskedasticity test also be
performed.

__Syntax for estat ovtest__

**estat** __ovt__**est** [**,** __r__**hs**]

__Menu for estat__

**Statistics > Postestimation**

__Description for estat ovtest__

**estat ovtest** performs two versions of the Ramsey (1969) regression
specification-error test (RESET) for omitted variables. This test
amounts to fitting y=xb+zt+u and then testing t=0. If the **rhs** option is
not specified, powers of the fitted values are used for z. If **rhs** is
specified, powers of the individual elements of x are used.

__Option for estat ovtest__

**rhs** specifies that powers of the right-hand-side (explanatory) variables
be used in the test rather than powers of the fitted values.

__Syntax for estat szroeter__

**estat** __szr__**oeter** [*varlist*] [**,** __r__**hs** __m__**test(***spec***)**]

Either *varlist* or **rhs** must be specified.

__Menu for estat__

**Statistics > Postestimation**

__Description for estat szroeter__

**estat szroeter** performs Szroeter's rank test for heteroskedasticity for
each of the variables in *varlist* or for the explanatory variables of the
regression if **rhs** is specified.

__Options for estat szroeter__

**rhs** specifies that tests for heteroskedasticity be performed for the
right-hand-side (explanatory) variables of the fitted regression
model. The **rhs** option may be combined with a *varlist*.

**mtest(***spec***)** specifies that multiple testing be performed. The argument
specifies how p-values are adjusted. The following specifications,
*spec*, are supported:

__b__**onferroni** Bonferroni's multiple testing adjustment
__h__**olm** Holm's multiple testing adjustment
__s__**idak** Sidak's multiple testing adjustment
__noadj__**ust** no adjustment is made for multiple testing

**estat szroeter** always performs multiple testing. By default, it does
not adjust the p-values.

__Variance inflation factors__

__Syntax for estat vif__

**estat vif** [**,** __unc__**entered**]

__Menu for estat__

**Statistics > Postestimation**

__Description for estat vif__

**estat vif** calculates the centered or uncentered variance inflation
factors (VIFs) for the independent variables specified in a linear
regression model.

__Option for estat vif__

**uncentered** requests that the computation of the uncentered variance
inflation factors. This option is often used to detect the
collinearity of the regressors with the constant. **estat vif,**
**uncentered** may be used after regression models fit without the
constant term.

__Measures of effect size__

__Syntax for estat esize__

**estat esize** [**,** __eps__**ilon** __om__**ega** __l__**evel(***#***)**]

__Menu for estat__

**Statistics > Postestimation**

__Description for estat esize__

**estat** **esize** calculates effect sizes for linear models after **regress** or
**anova**. By default, **estat** **esize** reports eta-squared estimates (Kerlinger
1964), which are equivalent to R-squared estimates. If the option
**epsilon** is specified, **estat** **esize** reports epsilon-squared estimates
(Grisson and Kim 2012). If the option **omega** is specified, **estat** **esize**
reports omega-squared estimates (Grisson and Kim 2012). Both
epsilon-squared and omega-squared are adjusted R-squared estimates.
Confidence intervals for eta-squared estimates are estimated by using the
noncentral F distribution (Smithson 2001). See Kline (2013) or Thompson
(2006) for further information.

__Options for estat esize__

**epsilon** specifies that the epsilon-squared estimates of effect size be
reported. The default is eta-squared estimates.

**omega** specifies that the omega-squared estimates of effect size be
reported. The default is eta-squared estimates.

**level(***#***)** specifies the confidence level, as a percentage, for confidence
intervals. The default is **level(95)** or as set by **set level**.

__Examples__

---------------------------------------------------------------------------
Setup
**. sysuse auto**
**. regress mpg weight foreign**

Obtain predicted values
**. predict pmpg**
**. summarize pmpg mpg**

---------------------------------------------------------------------------
Setup
**. webuse newautos, clear**

Obtain out-of-sample prediction
**. predict mpg**

Obtain standard error of the forecast
**. predict se_mpg, stdf**

---------------------------------------------------------------------------
Setup
**. sysuse auto, clear**
**. regress mpg weight c.weight#c.weight foreign**

Diagonal elements of projection matrix
**. predict xdist, hat**

---------------------------------------------------------------------------
Setup
**. sysuse auto, clear**
**. regress price weight foreign##c.mpg**

Leverage-versus-residual-squared plot
**. lvr2plot**

Standardized residuals
**. predict esta if e(sample), rstandard**

Studentized residuals
**. predict estu if e(sample), rstudent**

---------------------------------------------------------------------------
Setup
**. sysuse auto, clear**
**. regress price weight foreign##c.mpg**

DFITS influence measure
**. predict dfits, dfits**

Cook's distance
**. predict cooksd if e(sample), cooksd**

Welsch distance
**. predict wd, welsch**

COVRATIO influence measure
**. predict covr, covratio**

DFBETAs influence measure
**. sort foreign make**
**. predict dfor, dfbeta(1.foreign)**

DFBETAs for all variables in regression
**. dfbeta**

Ramsey's test for omitted variables
**. estat ovtest**

Test for heteroskedasticity
**. estat hettest**
**. estat hettest weight foreign##c.mpg, mtest(b)**

Rank test for heteroskedasticity
**. estat szroeter, rhs mtest(holm)**

Tests for heteroskedasticity, skewness, and kurtosis
**. estat imtest**

---------------------------------------------------------------------------
Setup
**. webuse bodyfat, clear**
**. regress bodyfat tricep thigh midarm**

Variance inflation factors
**. estat vif**

---------------------------------------------------------------------------
Setup
**. webuse nhanes2**

Regress systolic blood pressure on age group, sex, and their interaction
**. regress bpsystol agegrp##sex**

Predictive margins of blood pressure for age groups
**. margins agegrp**

Profile plot of margins
**. marginsplot**

Margins for interaction between age group and sex
**. margins agegrp#sex**

Interaction plot
**. marginsplot**

Estimate for each age group a contrast comparing men and women
**. margins r.sex@agegrp**

Plot contrasts and confidence intervals against age group
**. marginsplot**

---------------------------------------------------------------------------
Setup
**. webuse lbw**

Effect size for linear models after **regress**
**. regress bwt smoke i.race**
**. estat esize**
**. estat esize, level(90)**
**. estat esize, omega**

---------------------------------------------------------------------------

__Stored results__

**estat hettest** stores the following results for the (multivariate) score
test in **r()**:

Scalars
**r(chi2)** chi-squared test statistic
**r(df)** #df for the asymptotic chi-squared distribution under
H_0
**r(p)** p-value

**estat hettest, fstat** stores the results for the (multivariate) score test
in **r()**:

Scalars
**r(F)** test statistic
**r(df_m)** #df of the test for the F distribution under H_0
**r(df_r)** #df of the residuals for the F distribution under H_0
**r(p)** p-value

**estat hettest** (if **mtest** is specified) and **estat szroeter** stores the
following in **r()**:

Matrices
**r(mtest)** a matrix of test results, with rows corresponding to the
univariate tests

**mtest[.,1]** chi-squared test statistic
**mtest[.,2]** #df
**mtest[.,3]** unadjusted p-value
**mtest[.,4]** adjusted p-value (if an **mtest()** adjustment
method is specified)

Macros
**r(mtmethod)** adjustment method for p-value

**estat imtest** stores the following in **r()**:

Scalars
**r(chi2_t)** IM-test statistic (= **r(chi2_h)** + **r(chi2_s)** + **r(chi2_k)**)
**r(df_t)** df for limiting chi-squared distribution under H_0 (=
**r(df_h)** + **r(df_s)** + **r(df_k)**)
**r(chi2_h)** heteroskedasticity test statistic
**r(df_h)** df for limiting chi-squared distribution under H_0
**r(chi2_s)** skewness test statistic
**r(df_s)** df for limiting chi-squared distribution under H_0
**r(chi2_k)** kurtosis test statistic
**r(df_k)** df for limiting chi-squared distribution under H_0
**r(chi2_w)** White's heteroskedasticity test (if **white** specified)
**r(df_w)** df for limiting chi-squared distribution under H_0

**estat ovtest** stores the following in **r()**:

Scalars
**r(p)** two-sided p-value
**r(F)** F statistic
**r(df)** degrees of freedom
**r(df_r)** residual degrees of freedom

**estat esize** stores the following in **r()**:

Scalars
**r(level)** confidence level

Matrices
**r(esize)** a matrix of effect sizes, confidence intervals, degrees
of freedom, and F statistics with rows corresponding
to each term in the model

**esize[.,1]** eta-squared
**esize[.,2]** lower confidence bound for eta-squared
**esize[.,3]** upper confidence bound for eta-squared
**esize[.,4]** epsilon-squared
**esize[.,5]** omega-squared
**esize[.,6]** numerator degrees of freedom
**esize[.,7]** denominator degrees of freedom
**esize[.,8]** F statistic

__References__

Belsley, D. A., E. Kuh, and R. E. Welsch. 1980. *Regression Diagnostics:*
*Identifying Influential Data and Sources of Collinearity*. New York:
Wiley.

Breusch, T. S., and A. R. Pagan. 1979. A simple test for
heteroscedasticity and random coefficient variation. *Econometrica* 47:
1287-1294.

Cameron, A. C., and P. K. Trivedi. 1990. The information matrix test and
its applied alternative hypotheses. Working Paper 372, University of
California-Davis, Institute of Governmental Affairs.

Cook, R. D. 1977. Detection of influential observations in linear
regression. *Technometrics* 19: 15-18.

Cook, R. D., and S. Weisberg. 1983. Diagnostics for heteroscedasticity
in regression. *Biometrika* 70: 1-10.

Grissom, R. J., and J. J. Kim. 2012. *Effect Sizes for Research:*
*Univariate and Multivariate Applications.* 2nd ed. New York:
Routledge.

Kerlinger, F. N. 1964. *Foundations of Behavioral Research*. New York:
Holt, Rinehart & Winston.

Kline, R. B. 2013. *Beyond Significance Testing: Statistics Reform in the*
*Behavioral Sciences*. 2nd ed. Washington, DC: American Psychological
Association.

Mallows, C. L. 1986. Augmented partial residuals. *Technometrics* 28:
313-319.

Ramsey, J. B. 1969. Tests for specification errors in classical linear
least-squares regression analysis. *Journal of the Royal Statistical*
*Society, Series B* 31: 350-371.

Smithson, M. 2001. Correct confidence intervals for various regression
effect sizes and parameters: The importance of noncentral
distributions in computing intervals. *Educational and Psychological*
*Measurement* 61: 605-632.

Thompson, B. 2006. *Foundations of Behavioral Statistics: An*
*Insight-Based Approach*. New York: Guilford Press.

Welsch, R. E. 1982. Influence functions and regression diagnostics. In
*Modern Data Analysis*, ed. R. L. Launer and A. F. Siegel, 149-169.
New York: Academic Press.

Welsch, R. E., and E. Kuh. 1977. Linear Regression Diagnostics.
Technical Report 923-77, Massachusetts Institute of Technology,
Cambridge, MA.

White, H. 1980. A heteroskedasticity-consistent covariance matrix
estimator and a direct test for heteroskedasticity. *Econometrica* 48:
817-838.