Stata 15 help for regress postestimation

[R] regress postestimation -- Postestimation tools for regress

Postestimation commands

The following postestimation commands are of special interest after regress:

Command Description ------------------------------------------------------------------------- dfbeta DFBETA influence statistics estat hettest tests for heteroskedasticity estat imtest information matrix test estat ovtest Ramsey regression specification-error test for omitted variables estat szroeter Szroeter's rank test for heteroskedasticity estat vif variance inflation factors for the independent variables estat esize eta-squared, epsilon-squared, and omega-squared effect sizes estat moran Moran test of residual correlation with nearby residuals ------------------------------------------------------------------------- These commands are not appropriate after the svy prefix.

The following standard postestimation commands are also available:

Command Description ------------------------------------------------------------------------- contrast contrasts and ANOVA-style joint tests of estimates estat ic Akaike's and Schwarz's Bayesian information criteria (AIC and BIC) estat summarize summary statistics for the estimation sample estat vce variance-covariance matrix of the estimators (VCE) estat (svy) postestimation statistics for survey data estimates cataloging estimation results * forecast dynamic forecasts and simulations * hausman Hausman's specification test lincom point estimates, standard errors, testing, and inference for linear combinations of coefficients linktest link test for model specification * lrtest likelihood-ratio test margins marginal means, predictive margins, marginal effects, and average marginal effects marginsplot graph the results from margins (profile plots, interaction plots, etc.) nlcom point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predict predictions, residuals, influence statistics, and other diagnostic measures predictnl point estimates, standard errors, testing, and inference for generalized predictions pwcompare pairwise comparisons of estimates suest seemingly unrelated estimation test Wald tests of simple and composite linear hypotheses testnl Wald tests of nonlinear hypotheses ------------------------------------------------------------------------- * forecast, hausman, and lrtest are not appropriate with svy estimation results. forecast is also not appropriate with mi estimation results.

Predictions

Syntax for predict

predict [type] newvar [if] [in] [, statistic]

statistic Description ------------------------------------------------------------------------- Main xb linear prediction; the default residuals residuals score score; equivalent to residuals rstandard standardized residuals rstudent Studentized (jackknifed) residuals cooksd Cook's distance leverage | hat leverage (diagonal elements of hat matrix) pr(a,b) Pr(y | a < y < b) e(a,b) E(y | a < y < b) ystar(a,b) E(y*), y* = max(a,min(y,b)) * dfbeta(varname) DFBETA for varname stdp standard error of the linear prediction stdf standard error of the forecast stdr standard error of the residual * covratio COVRATIO * dfits DFITS * welsch Welsch distance ------------------------------------------------------------------------- Unstarred statistics are available both in and out of sample; type predict ... if e(sample) ... if wanted only for the estimation sample. Starred statistics are calculated only for the estimation sample, even when if e(sample) is not specified. rstandard, rstudent, cooksd, leverage, dfbeta(), stdf, stdr, covratio, dfits, and welsch are not available if any vce() other than vce(ols) was specified with regress. xb, residuals, score, and stdp are the only options allowed with svy estimation results.

where a and b may be numbers or variables; a missing (a > .) means minus infinity, and b missing (b > .) means plus infinity; see missing.

Menu for predict

Statistics > Postestimation

Description for predict

predict creates a new variable containing predictions such as linear predictions, residuals, standardized residuals, Studentized residuals, Cook's distance, leverage, probabilities, expected values, DFBETAs for varname, standard errors, COVRATIOs, DFITS, and Welsch distances.

Options for predict

+------+ ----+ Main +-------------------------------------------------------------

xb, the default, calculates the linear prediction.

residuals calculates the residuals.

score is equivalent to residuals in linear regression.

rstandard calculates the standardized residuals.

rstudent calculates the Studentized (jackknifed) residuals.

cooksd calculates the Cook's D influence statistic (Cook 1977).

leverage or hat calculates the diagonal elements of the projection ("hat") matrix.

pr(a,b) calculates Pr(a < xb + u < b), the probability that y|x would be observed in the interval (a,b).

a and b may be specified as numbers or variable names; lb and ub are variable names; pr(20,30) calculates Pr(20 < xb + u < 30); pr(lb,ub) calculates Pr(lb < xb + u < ub); and pr(20,ub) calculates Pr(20 < xb + u < ub).

a missing (a > .) means minus infinity; pr(.,30) calculates Pr(-infinity < xb + u < 30); pr(lb,30) calculates Pr(-infinity < xb + u < 30) in observations for which lb > . and calculates Pr(lb < xb + u < 30) elsewhere.

b missing (b > .) means plus infinity; pr(20,.) calculates Pr(+infinity > xb + u > 20); pr(20,ub) calculates Pr(+infinity > xb + u > 20) in observations for which ub > . and calculates Pr(20 < xb + u < ub) elsewhere.

e(a,b) calculates E(xb+u | a < xb+u < b), the expected value of y|x conditional on y|x being in the interval (a,b), meaning that y|x is truncated. a and b are specified as they are for pr().

ystar(a,b) calculates E(y*), where y* = a if xb+u < a, y* = b if xb+u > b, and y* = xb+u otherwise, meaning that y* is censored. a and b are specified as they are for pr().

dfbeta(varname) calculates the DFBETA for varname, the difference between the regression coefficient when the jth observation is included and excluded, said difference being scaled by the estimated standard error of the coefficient. varname must have been included among the regressors in the previously fitted model. The calculation is automatically restricted to the estimation subsample.

stdp calculates the standard error of the prediction, which can be thought of as the standard error of the predicted expected value or mean for the observation's covariate pattern. The standard error of the prediction is also referred to as the standard error of the fitted value.

stdf calculates the standard error of the forecast, which is the standard error of the point prediction for 1 observation. It is commonly referred to as the standard error of the future or forecast value. By construction, the standard errors produced by stdf are always larger than those produced by stdp; see Methods and formulas in [R] regress postestimation.

stdr calculates the standard error of the residuals.

covratio calculates COVRATIO (Belsley, Kuh, and Welsch 1980), a measure of the influence of the jth observation based on considering the effect on the variance-covariance matrix of the estimates. The calculation is automatically restricted to the estimation subsample.

dfits calculates DFITS (Welsch and Kuh 1977) and attempts to summarize the information in the leverage versus residual-squared plot into one statistic. The calculation is automatically restricted to the estimation subsample.

welsch calculates Welsch distance (Welsch 1982) and is a variation on dfits. The calculation is automatically restricted to the estimation subsample.

Margins

Syntax for margins

margins [marginlist] [, options]

margins [marginlist] , predict(statistic ...) [options]

statistic Description ------------------------------------------------------------------------- xb linear prediction; the default pr(a,b) not allowed with margins e(a,b) not allowed with margins ystar(a,b) not allowed with margins residuals not allowed with margins score not allowed with margins rstandard not allowed with margins rstudent not allowed with margins cooksd not allowed with margins leverage | hat not allowed with margins dfbeta(varname) not allowed with margins stdp not allowed with margins stdf not allowed with margins stdr not allowed with margins covratio not allowed with margins dfits not allowed with margins welsch not allowed with margins -------------------------------------------------------------------------

Statistics not allowed with margins are functions of stochastic quantities other than e(b).

For the full syntax, see [R] margins.

Menu for margins

Statistics > Postestimation

Description for margins

margins estimates margins of response for linear predictions.

DFBETA influence statistics

Syntax for dfbeta

dfbeta [indepvar [indepvar [...]]] [, stub(name)]

Menu for dfbeta

Statistics > Linear models and related > Regression diagnostics > DFBETAs

Description for dfbeta

dfbeta will calculate one, more than one, or all the DFBETAs after regress. Although predict will also calculate DFBETAs, predict can do this for only one variable at a time. dfbeta is a convenience tool for those who want to calculate DFBETAs for multiple variables. The names for the new variables created are chosen automatically and begin with the letters _dfbeta_.

Option for dfbeta

stub(name) specifies the leading characters dfbeta uses to name the new variables to be generated. The default is stub(_dfbeta_).

Tests for violation of assumptions

Syntax for estat hettest

estat hettest [varlist] [, rhs [normal | iid | fstat] mtest[(spec)]]

Menu for estat

Statistics > Postestimation

Description for estat hettest

estat hettest performs three versions of the Breusch-Pagan (1979) and Cook-Weisberg (1983) test for heteroskedasticity. All three versions of this test present evidence against the null hypothesis that t=0 in Var(e)=sigma^2 exp(zt). In the normal version, performed by default, the null hypothesis also includes the assumption that the regression disturbances are independent-normal draws with variance sigma^2. The normality assumption is dropped from the null hypothesis in the iid and fstat versions, which respectively produce the score and F tests discussed in Methods and formulas in [R] regress postestimation. If varlist is not specified, the fitted values are used for z. If varlist or the rhs option is specified, the variables specified are used for z.

Options for estat hettest

rhs specifies that tests for heteroskedasticity be performed for the right-hand-side (explanatory) variables of the fitted regression model. The rhs option may be combined with a varlist, in which case the the variables in varlist are included in the model for the variance along with the explanatory variables.

normal, the default, causes estat hettest to compute the original Breusch-Pagan/Cook-Weisberg test, which assumes that the regression disturbances are normally distributed.

iid causes estat hettest to compute the N*R2 version of the score test that drops the normality assumption.

fstat causes estat hettest to compute the F-statistic version that drops the normality assumption.

mtest[(spec)] specifies that multiple testing be performed. The argument specifies how p-values are adjusted. The following specifications, spec, are supported:

bonferroni Bonferroni's multiple testing adjustment holm Holm's multiple testing adjustment sidak Sidak's multiple testing adjustment noadjust no adjustment is made for multiple testing

mtest may be specified without an argument. This is equivalent to specifying mtest(noadjust); that is, tests for the individual variables should be performed with unadjusted p-values. By default, estat hettest does not perform multiple testing. mtest may not be specified with iid or fstat.

Syntax for estat imtest

estat imtest [, preserve white]

Menu for estat

Statistics > Postestimation

Description for estat imtest

estat imtest performs an information matrix test for the regression model and an orthogonal decomposition into tests for heteroskedasticity, skewness, and kurtosis due to Cameron and Trivedi (1990); White's test for homoskedasticity against unrestricted forms of heteroskedasticity (1980) is available as an option. White's test is usually similar to the first term of the Cameron-Trivedi decomposition.

Options for estat imtest

preserve specifies that the data in memory be preserved, all variables and cases that are not needed in the calculations be dropped, and at the conclusion the original data be restored. This option is costly for large datasets. However, because estat imtest has to perform an auxiliary regression on k(k+1)/2 temporary variables, where k is the number of regressors, it may not be able to perform the test otherwise.

white specifies that White's original heteroskedasticity test also be performed.

Syntax for estat ovtest

estat ovtest [, rhs]

Menu for estat

Statistics > Postestimation

Description for estat ovtest

estat ovtest performs two versions of the Ramsey (1969) regression specification-error test (RESET) for omitted variables. This test amounts to fitting y=xb+zt+u and then testing t=0. If the rhs option is not specified, powers of the fitted values are used for z. If rhs is specified, powers of the individual elements of x are used.

Option for estat ovtest

rhs specifies that powers of the right-hand-side (explanatory) variables be used in the test rather than powers of the fitted values.

Syntax for estat szroeter

estat szroeter [varlist] [, rhs mtest(spec)]

Either varlist or rhs must be specified.

Menu for estat

Statistics > Postestimation

Description for estat szroeter

estat szroeter performs Szroeter's rank test for heteroskedasticity for each of the variables in varlist or for the explanatory variables of the regression if rhs is specified.

Options for estat szroeter

rhs specifies that tests for heteroskedasticity be performed for the right-hand-side (explanatory) variables of the fitted regression model. The rhs option may be combined with a varlist.

mtest(spec) specifies that multiple testing be performed. The argument specifies how p-values are adjusted. The following specifications, spec, are supported:

bonferroni Bonferroni's multiple testing adjustment holm Holm's multiple testing adjustment sidak Sidak's multiple testing adjustment noadjust no adjustment is made for multiple testing

estat szroeter always performs multiple testing. By default, it does not adjust the p-values.

Variance inflation factors

Syntax for estat vif

estat vif [, uncentered]

Menu for estat

Statistics > Postestimation

Description for estat vif

estat vif calculates the centered or uncentered variance inflation factors (VIFs) for the independent variables specified in a linear regression model.

Option for estat vif

uncentered requests that the computation of the uncentered variance inflation factors. This option is often used to detect the collinearity of the regressors with the constant. estat vif, uncentered may be used after regression models fit without the constant term.

Measures of effect size

Syntax for estat esize

estat esize [, epsilon omega level(#)]

Menu for estat

Statistics > Postestimation

Description for estat esize

estat esize calculates effect sizes for linear models after regress or anova. By default, estat esize reports eta-squared estimates (Kerlinger 1964), which are equivalent to R-squared estimates. If the option epsilon is specified, estat esize reports epsilon-squared estimates (Grisson and Kim 2012). If the option omega is specified, estat esize reports omega-squared estimates (Grisson and Kim 2012). Both epsilon-squared and omega-squared are adjusted R-squared estimates. Confidence intervals for eta-squared estimates are estimated by using the noncentral F distribution (Smithson 2001). See Kline (2013) or Thompson (2006) for further information.

Options for estat esize

epsilon specifies that the epsilon-squared estimates of effect size be reported. The default is eta-squared estimates.

omega specifies that the omega-squared estimates of effect size be reported. The default is eta-squared estimates.

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level.

Examples

--------------------------------------------------------------------------- Setup . sysuse auto . regress mpg weight foreign

Obtain predicted values . predict pmpg . summarize pmpg mpg

--------------------------------------------------------------------------- Setup . webuse newautos, clear

Obtain out-of-sample prediction . predict mpg

Obtain standard error of the forecast . predict se_mpg, stdf

--------------------------------------------------------------------------- Setup . sysuse auto, clear . regress mpg weight c.weight#c.weight foreign

Diagonal elements of projection matrix . predict xdist, hat

--------------------------------------------------------------------------- Setup . sysuse auto, clear . regress price weight foreign##c.mpg

Leverage-versus-residual-squared plot . lvr2plot

Standardized residuals . predict esta if e(sample), rstandard

Studentized residuals . predict estu if e(sample), rstudent

--------------------------------------------------------------------------- Setup . sysuse auto, clear . regress price weight foreign##c.mpg

DFITS influence measure . predict dfits, dfits

Cook's distance . predict cooksd if e(sample), cooksd

Welsch distance . predict wd, welsch

COVRATIO influence measure . predict covr, covratio

DFBETAs influence measure . sort foreign make . predict dfor, dfbeta(1.foreign)

DFBETAs for all variables in regression . dfbeta

Ramsey's test for omitted variables . estat ovtest

Test for heteroskedasticity . estat hettest . estat hettest weight foreign##c.mpg, mtest(b)

Rank test for heteroskedasticity . estat szroeter, rhs mtest(holm)

Tests for heteroskedasticity, skewness, and kurtosis . estat imtest

--------------------------------------------------------------------------- Setup . webuse bodyfat, clear . regress bodyfat tricep thigh midarm

Variance inflation factors . estat vif

--------------------------------------------------------------------------- Setup . webuse nhanes2

Regress systolic blood pressure on age group, sex, and their interaction . regress bpsystol agegrp##sex

Predictive margins of blood pressure for age groups . margins agegrp

Profile plot of margins . marginsplot

Margins for interaction between age group and sex . margins agegrp#sex

Interaction plot . marginsplot

Estimate for each age group a contrast comparing men and women . margins r.sex@agegrp

Plot contrasts and confidence intervals against age group . marginsplot

--------------------------------------------------------------------------- Setup . webuse lbw

Effect size for linear models after regress . regress bwt smoke i.race . estat esize . estat esize, level(90) . estat esize, omega

---------------------------------------------------------------------------

Stored results

estat hettest stores the following results for the (multivariate) score test in r():

Scalars r(chi2) chi-squared test statistic r(df) #df for the asymptotic chi-squared distribution under H_0 r(p) p-value

estat hettest, fstat stores the results for the (multivariate) score test in r():

Scalars r(F) test statistic r(df_m) #df of the test for the F distribution under H_0 r(df_r) #df of the residuals for the F distribution under H_0 r(p) p-value

estat hettest (if mtest is specified) and estat szroeter stores the following in r():

Matrices r(mtest) a matrix of test results, with rows corresponding to the univariate tests

mtest[.,1] chi-squared test statistic mtest[.,2] #df mtest[.,3] unadjusted p-value mtest[.,4] adjusted p-value (if an mtest() adjustment method is specified)

Macros r(mtmethod) adjustment method for p-value

estat imtest stores the following in r():

Scalars r(chi2_t) IM-test statistic (= r(chi2_h) + r(chi2_s) + r(chi2_k)) r(df_t) df for limiting chi-squared distribution under H_0 (= r(df_h) + r(df_s) + r(df_k)) r(chi2_h) heteroskedasticity test statistic r(df_h) df for limiting chi-squared distribution under H_0 r(chi2_s) skewness test statistic r(df_s) df for limiting chi-squared distribution under H_0 r(chi2_k) kurtosis test statistic r(df_k) df for limiting chi-squared distribution under H_0 r(chi2_w) White's heteroskedasticity test (if white specified) r(df_w) df for limiting chi-squared distribution under H_0

estat ovtest stores the following in r():

Scalars r(p) two-sided p-value r(F) F statistic r(df) degrees of freedom r(df_r) residual degrees of freedom

estat esize stores the following in r():

Scalars r(level) confidence level

Matrices r(esize) a matrix of effect sizes, confidence intervals, degrees of freedom, and F statistics with rows corresponding to each term in the model

esize[.,1] eta-squared esize[.,2] lower confidence bound for eta-squared esize[.,3] upper confidence bound for eta-squared esize[.,4] epsilon-squared esize[.,5] omega-squared esize[.,6] numerator degrees of freedom esize[.,7] denominator degrees of freedom esize[.,8] F statistic

References

Belsley, D. A., E. Kuh, and R. E. Welsch. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley.

Breusch, T. S., and A. R. Pagan. 1979. A simple test for heteroscedasticity and random coefficient variation. Econometrica 47: 1287-1294.

Cameron, A. C., and P. K. Trivedi. 1990. The information matrix test and its applied alternative hypotheses. Working Paper 372, University of California-Davis, Institute of Governmental Affairs.

Cook, R. D. 1977. Detection of influential observations in linear regression. Technometrics 19: 15-18.

Cook, R. D., and S. Weisberg. 1983. Diagnostics for heteroscedasticity in regression. Biometrika 70: 1-10.

Grissom, R. J., and J. J. Kim. 2012. Effect Sizes for Research: Univariate and Multivariate Applications. 2nd ed. New York: Routledge.

Kerlinger, F. N. 1964. Foundations of Behavioral Research. New York: Holt, Rinehart & Winston.

Kline, R. B. 2013. Beyond Significance Testing: Statistics Reform in the Behavioral Sciences. 2nd ed. Washington, DC: American Psychological Association.

Mallows, C. L. 1986. Augmented partial residuals. Technometrics 28: 313-319.

Ramsey, J. B. 1969. Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society, Series B 31: 350-371.

Smithson, M. 2001. Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement 61: 605-632.

Thompson, B. 2006. Foundations of Behavioral Statistics: An Insight-Based Approach. New York: Guilford Press.

Welsch, R. E. 1982. Influence functions and regression diagnostics. In Modern Data Analysis, ed. R. L. Launer and A. F. Siegel, 149-169. New York: Academic Press.

Welsch, R. E., and E. Kuh. 1977. Linear Regression Diagnostics. Technical Report 923-77, Massachusetts Institute of Technology, Cambridge, MA.

White, H. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817-838.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index