Stata 15 help for regress

[R] regress -- Linear regression


regress depvar [indepvars] [if] [in] [weight] [, options]

options Description ------------------------------------------------------------------------- Model noconstant suppress constant term hascons has user-supplied constant tsscons compute total sum of squares with constant; seldom used

SE/Robust vce(vcetype) vcetype may be ols, robust, cluster clustvar, bootstrap, jackknife, hc2, or hc3

Reporting level(#) set confidence level; default is level(95) beta report standardized beta coefficients eform(string) report exponentiated coefficients and label as string depname(varname) substitute dependent variable name; programmer's option display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling

noheader suppress output header notable suppress coefficient table plus make table extendable mse1 force mean squared error to 1 coeflegend display legend instead of statistics ------------------------------------------------------------------------- indepvars may contain factor variables; see fvvarlist. depvar and indepvars may contain time-series operators; see tsvarlist. bayes, bootstrap, by, fmm, fp, jackknife, mfp, mi estimate, nestreg, rolling, statsby, stepwise, and svy are allowed; see prefix. For more details, see [BAYES] bayes: regress and [FMM] fmm: regress. vce(bootstrap) and vce(jackknife) are not allowed with the mi estimate prefix. Weights are not allowed with the bootstrap prefix. aweights are not allowed with the jackknife prefix. hascons, tsscons, vce(), beta, noheader, notable, plus, depname(), mse1, and weights are not allowed with the svy prefix. aweights, fweights, iweights, and pweights are allowed; see weight. noheader, notable, plus, mse1, and coeflegend do not appear in the dialog box. See [R] regress postestimation for features available after estimation.


Statistics > Linear models and related > Linear regression


regress performs ordinary least-squares linear regression. regress can also perform weighted estimation, compute robust and cluster-robust standard errors, and adjust results for complex survey designs.


+-------+ ----+ Model +------------------------------------------------------------

noconstant; see [R] estimation options.

hascons indicates that a user-defined constant or its equivalent is specified among the independent variables in indepvars. Some caution is recommended when specifying this option, as resulting estimates may not be as accurate as they otherwise would be. Use of this option requires "sweeping" the constant last, so the moment matrix must be accumulated in absolute rather than deviation form. This option may be safely specified when the means of the dependent and independent variables are all reasonable and there is not much collinearity between the independent variables. The best procedure is to view hascons as a reporting option -- estimate with and without hascons and verify that the coefficients and standard errors of the variables not affected by the identity of the constant are unchanged.

tsscons forces the total sum of squares to be computed as though the model has a constant, that is, as deviations from the mean of the dependent variable. This is a rarely used option that has an effect only when specified with noconstant. It affects the total sum of squares and all results derived from the total sum of squares.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory (ols), that are robust to some kinds of misspecification (robust), that allow for intragroup correlation (cluster clustvar), and that use bootstrap or jackknife methods (bootstrap, jackknife); see [R] vce_option.

vce(ols), the default, uses the standard variance estimator for ordinary least-squares regression.

regress also allows the following:

vce(hc2) and vce(hc3) specify an alternative bias correction for the robust variance calculation. vce(hc2) and vce(hc3) may not be specified with the svy prefix. In the unclustered case, vce(robust) uses (sigma-hat_j)^2={n/(n-k)}(u_j)^2 as an estimate of the variance of the jth observation, where u_j is the calculated residual and n/(n-k) is included to improve the overall estimate's small-sample properties.

vce(hc2) instead uses u_j^2/(1-h_jj) as the observation's variance estimate, where h_jj is the diagonal element of the hat (projection) matrix. This estimate is unbiased if the model really is homoskedastic. vce(hc2) tends to produce slightly more conservative confidence intervals.

vce(hc3) uses u_j^2/(1-h_jj)^2 as suggested by Davidson and MacKinnon (1993), who report that this method tends to produce better results when the model really is heteroskedastic. vce(hc3) produces confidence intervals that tend to be even more conservative.

See Davidson and MacKinnon (1993, 554-556) and Angrist and Pischke (2009, 294-308) for more discussion on these two bias corrections.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

beta asks that standardized beta coefficients be reported instead of confidence intervals. The beta coefficients are the regression coefficients obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1. beta may not be specified with vce(cluster clustvar) or the svy prefix.

eform(string) is used only in programs and ado-files that use regress to fit models other than linear regression. eform() specifies that the coefficient table be displayed in exponentiated form as defined in [R] maximize and that string be used to label the exponentiated coefficients in the table.

depname(varname) is used only in programs and ado-files that use regress to fit models other than linear regression. depname() may be specified only at estimation time. varname is recorded as the identity of the dependent variable, even though the estimates are calculated using depvar. This method affects the labeling of the output -- not the results calculated -- but could affect subsequent calculations made by predict, where the residual would be calculated as deviations from varname rather than depvar. depname() is most typically used when depvar is a temporary variable (see [P] macro) used as a proxy for varname.

depname() is not allowed with the svy prefix.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

The following options are available with regress but are not shown in the dialog box:

noheader suppresses the display of the ANOVA table and summary statistics at the top of the output; only the coefficient table is displayed. This option is often used in programs and ado-files.

notable suppresses display of the coefficient table.

plus specifies that the output table be made extendable. This option is often used in programs and ado-files.

mse1 is used only in programs and ado-files that use regress to fit models other than linear regression and is not allowed with the svy prefix. mse1 sets the mean squared error to 1, forcing the variance-covariance matrix of the estimators to be (X'X)^-1 (see Methods and formulas in [R] regress) and affecting calculated standard errors. Degrees of freedom for t statistics is calculated as n rather than n-k.

coeflegend; see [R] estimation options.

Examples: linear regression

Setup . sysuse auto

Fit a linear regression . regress mpg weight foreign

Fit a better linear regression, from a physics standpoint . gen gp100m = 100/mpg . regress gp100m weight foreign

Obtain beta coefficients without refitting model . regress, beta

Suppress intercept term . regress weight length, noconstant

Model already has constant . regress weight length bn.foreign, hascons

Examples: regression with robust standard errors

----------------------------------------------------------------------- . sysuse auto, clear . generate gpmw = ((1/mpg)/weight)*100*1000 . regress gpmw foreign . regress gpmw foreign, vce(robust) . regress gpmw foreign, vce(hc2) . regress gpmw foreign, vce(hc3) ----------------------------------------------------------------------- . webuse regsmpl, clear . regress ln_wage age c.age#c.age tenure, vce(cluster id) -----------------------------------------------------------------------

Example: weighted regression

. sysuse census . regress death medage i.region [aw=pop]

Examples: linear regression with survey data

Setup . webuse highschool

Perform linear regression using survey data . svy: regress weight height

Setup . generate male = sex == 1 if !missing(sex)

Perform linear regression using survey data for a subpopulation . svy, subpop(male): regress weight height

Video example

Simple linear regression in Stata

Stored results

regress stores the following in e():

Scalars e(N) number of observations e(mss) model sum of squares e(df_m) model degrees of freedom e(rss) residual sum of squares e(df_r) residual degrees of freedom e(r2) R-squared e(r2_a) adjusted R-squared e(F) F statistic e(rmse) root mean squared error e(ll) log likelihood under additional assumption of i.i.d. normal errors e(ll_0) log likelihood, constant-only model e(N_clust) number of clusters e(rank) rank of e(V)

Macros e(cmd) regress e(cmdline) command as typed e(depvar) name of dependent variable e(wtype) weight type e(wexp) weight expression e(model) ols e(title) title in estimation output when vce() is not ols e(clustvar) name of cluster variable e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(properties) b V e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance

Functions e(sample) marks estimation sample


Angrist, J. D., and J.-S. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.

Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index