help regress dialogs: regress svy: regress
also see: regress postestimation
regress postestimation ts
-------------------------------------------------------------------------------
Title
[R] regress -- Linear regression
Syntax
regress depvar [indepvars] [if] [in] [weight] [, options]
options description
-------------------------------------------------------------------------
Model
noconstant suppress constant term
hascons has user-supplied constant
tsscons compute total sum of squares with constant; seldom
used
SE/Robust
vce(vcetype) vcetype may be ols, robust, cluster clustvar,
bootstrap, jackknife, hc2, or hc3
Reporting
level(#) set confidence level; default is level(95)
beta report standardized beta coefficients
eform(string) report exponentiated coefficients and label as
string
depname(varname) substitute dependent variable name; programmer's
option
display_options control spacing and display of omitted variables
and base and empty cells
+ noheader suppress table header
+ notable suppress coefficient header
+ plus make table extendable
+ mse1 force MSE to be 1
+ coeflegend display coefficients' legend instead of coefficient
table
-------------------------------------------------------------------------
+ noheader, notable, plus, mse1, and coeflegend do not appear in the
dialog box.
indepvars may contain factor variables; see fvvarlist.
depvar and indepvars may contain time-series operators; see tsvarlist.
bootstrap, by, fracpoly, jackknife, mfp, mi estimate, nestreg, rolling,
statsby, stepwise, and svy are allowed; see prefix.
vce(bootstrap) and vce(jackknife) are not allowed with the mi estimate
prefix.
Weights are not allowed with the bootstrap prefix.
aweights are not allowed with the jackknife prefix.
hascons, tsscons, vce(), beta, noheader, notable, plus, depname(), mse1,
and weights are not allowed with the svy prefix.
aweights, fweights, iweights, and pweights are allowed; see weight.
See [R] regress postestimation for features available after estimation.
Menu
Statistics > Linear models and related > Linear regression
Description
regress fits a model of depvar on indepvars using linear regression.
Here is a short list of other regression commands that may be of
interest. See [I] estimation commands for a complete list.
command description
-------------------------------------------------------------------------
areg an easier way to fit regressions with many dummy
variables
arch regression models with ARCH errors
arima ARIMA models
boxcox Box-Cox regression models
cnsreg constrained linear regression
eivreg errors-in-variables regression
frontier stochastic frontier models
gmm generalized method of moments estimation
heckman Heckman selection model
intreg interval regression
ivregress single-equation instrumental-variables regression
ivtobit tobit regression with endogenous variables
newey regression with Newey-West standard errors
nl nonlinear least-squares estimation
nlsur estimation of nonlinear systems of equations
qreg quantile (including median) regression
reg3 three-stage least-squares (3SLS) regression
rreg a type of robust regression
sureg seemingly unrelated regression
tobit tobit regression
treatreg treatment-effects model
truncreg truncated regression
xtabond Arellano-Bond linear dynamic panel-data estimation
xtdpd linear dynamic panel-data estimation
xtfrontier panel-data stochastic frontier model
xtgls panel-data GLS models
xthtaylor Hausman-Taylor estimator for error-components models
xtintreg panel-data interval regression models
xtivreg panel-data instrumental-variables (2SLS) regression
xtpcse linear regression with panel-corrected standard errors
xtreg fixed- and random-effects linear models
xtregar fixed- and random-effects linear models with an AR(1)
disturbance
xttobit panel-data tobit models
-------------------------------------------------------------------------
Options
+-------+
----+ Model +------------------------------------------------------------
noconstant; see [R] estimation options.
hascons indicates that a user-defined constant or its equivalent is
specified among the independent variables in varlist. Some caution
is recommended when specifying this option, as resulting estimates
may not be as accurate as they otherwise would be. Use of this
option requires "sweeping" the constant last, so the moment matrix
must be accumulated in absolute rather than deviation form. This
option may be safely specified when the means of the dependent and
independent variables are all reasonable and there is not much
collinearity between the independent variables. The best procedure
is to view hascons as a reporting option -- estimate with and without
hascons and verify that the coefficients and standard errors of the
variables not affected by the identity of the constant are unchanged.
tsscons forces the total sum of squares to be computed as though the
model has a constant, that is, as deviations from the mean of the
dependent variable. This is a rarely used option that has an effect
only when specified with noconstant. It affects only the total sum
of squares and all results derived from the total sum of squares.
+-----------+
----+ SE/Robust +--------------------------------------------------------
vce(vcetype) specifies the type of standard error reported, which
includes types that are derived from asymptotic theory, that are
robust to some kinds of misspecification, that allow for intragroup
correlation, and that use bootstrap or jackknife methods; see [R]
vce_option.
vce(ols), the default, uses the standard variance estimator for
ordinary least-squares regression.
regress also allows the following:
vce(hc2) and vce(hc3) specify an alternative bias correction for the
robust variance calculation. vce(hc2) and vce(hc3) may not be
specified with the svy prefix. In the unclustered case,
vce(robust) uses (sigma-hat_j)^2={n/(n-k)}(u_j)^2 as an estimate
of the variance of the jth observation, where u_j is the
calculated residual and n/(n-k) is included to improve the
overall estimate's small-sample properties.
vce(hc2) instead uses u_j^2/(1-h_jj) as the observation's
variance estimate, where h_jj is the diagonal element of the hat
(projection) matrix. This estimate is unbiased if the model
really is homoskedastic. vce(hc2) tends to produce slightly more
conservative confidence intervals.
vce(hc3) uses u_j^2/(1-h_jj)^2 as suggested by Davidson and
MacKinnon (1993), who report that this method tends to produce
better results when the model really is heteroskedastic.
vce(hc3) produces confidence intervals that tend to be even more
conservative.
+-----------+
----+ Reporting +--------------------------------------------------------
level(#); see [R] estimation options.
beta asks that standardized beta coefficients be reported instead of
confidence intervals. The beta coefficients are the regression
coefficients obtained by first standardizing all variables to have a
mean of 0 and a standard deviation of 1. beta may not be specified
with vce(cluster clustvar) or the svy prefix.
eform(string) is used only in programs and ado-files that use regress to
fit models other than linear regression. eform() specifies that the
coefficient table be displayed in exponentiated form as defined in
[R] maximize and that string be used to label the exponentiated
coefficients in the table.
depname(varname) is used only in programs and ado-files that use regress
to fit models other than linear regression. depname() may be
specified only at estimation time. varname is recorded as the
identity of the dependent variable, even though the estimates are
calculated using depvar. This method affects the labeling of the
output -- not the results calculated -- but could affect subsequent
calculations made by predict, where the residual would be calculated
as deviations from varname rather than depvar. depname() is most
typically used when depvar is a temporary variable (see [P] macro)
used as a proxy for varname.
depname() is not allowed with the svy prefix.
display_options: noomitted, vsquish, noemptycells, baselevels,
allbaselevels; see [R] estimation options.
The following options are available with regress but are not shown in the
dialog box:
noheader suppresses the display of the ANOVA table and summary statistics
at the top of the output; only the coefficient table is displayed.
This option is often used in programs and ado-files.
notable suppresses display of the coefficient table.
plus specifies that the output table be made extendable. This option is
often used in programs and ado-files.
mse1 is used only in programs and ado-files that use regress to fit
models other than linear regression and is not allowed with the svy
prefix. mse1 sets the mean squared error to 1, thus forcing the
variance-covariance matrix of the estimators to be (X'DX)^-1 and
affecting calculated standard errors. Degrees of freedom for t
statistics are calculated as n rather than n-k.
coeflegend; see [R] estimation options.
Examples: linear regression
Setup
. sysuse auto
. regress mpg weight c.weight#c.weight foreign
Obtain beta coefficients without refitting model
. regress, beta
Suppress intercept term
. regress weight length, noconstant
Model already has constant
. regress weight length bn.foreign, hascons
Examples: regression with robust standard errors
-----------------------------------------------------------------------
. sysuse auto, clear
. generate gpmw = ((1/mpg)/weight)*100*1000
. regress gpmw foreign
. regress gpmw foreign, vce(robust)
. regress gpmw foreign, vce(hc2)
. regress gpmw foreign, vce(hc3)
-----------------------------------------------------------------------
. webuse regsmpl, clear
. regress ln_wage age c.age#c.age tenure, vce(cluster id)
-----------------------------------------------------------------------
Example: weighted regression
. sysuse census
. regress death medage i.region [aw=pop]
Examples: linear regression with survey data
Setup
. webuse highschool, clear
Perform linear regression using survey data
. svy: regress weight height
Setup
. generate male = sex == 1 if !missing(sex)
Perform linear regression using survey data for a subpopulation
. svy, subpop(male): regress weight height
Saved results
regress saves the following in e():
Scalars
e(N) number of observations
e(mss) model sum of squares
e(df_m) model degrees of freedom
e(rss) residual sum of squares
e(df_r) residual degrees of freedom
e(r2) R-squared
e(r2_a) adjusted R-squared
e(F) F statistic
e(rmse) root mean squared error
e(ll) log likelihood under additional assumption of
i.i.d. normal errors
e(ll_0) log likelihood, constant-only model
e(N_clust) number of clusters
e(rank) rank of e(V)
Macros
e(cmd) regress
e(cmdline) command as typed
e(depvar) name of dependent variable
e(model) ols or iv
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output when vce() is not ols
e(clustvar) name of cluster variable
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. Err.
e(properties) b V
e(estat_cmd) program used to implement estat
e(predict) program used to implement predict
e(marginsok) predictions allowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(V) variance-covariance matrix of the estimators
e(V_modelbased) model-based variance
Functions
e(sample) marks estimation sample
Reference
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in
Econometrics. New York: Oxford University Press.
Also see
Manual: [R] regress
Help: [R] regress postestimation, [R] regress postestimation time
series;
[R] anova, [TS] dfactor, [TS] dvech