Stata 15 help for _predict

[P] _predict -- Obtain predictions, residuals, etc., after estimation programming command


After regress

_predict [type] newvar [if] [in] [, xb stdp stdf stdr hat cooksd residuals rstandard rstudent nolabel]

After single-equation (SE) estimators

_predict [type] newvar [if] [in] [, xb stdp nooffset nolabel]

After multiple-equation (ME) estimators

_predict [type] newvar [if] [in] [, xb stdp stddp nooffset nolabel equation(eqno[, eqno])]


_predict is for use by programmers as a subroutine for implementing the predict command for use after estimation; see [R] predict.


xb calculates the linear prediction from the fitted model. That is, all models can be thought of as estimating a set of parameters b1, b2, ..., bk, and the linear prediction is y = xb. For linear regression, the values y are called the predicted values or, for out-of-sample predictions, the forecast. For logit and probit, for example, y is called the logit or probit index.

It is important to understand that the x1, x2, ..., xk used in the calculation are obtained from the data currently in memory and do not have to correspond to the data on the independent variables used in fitting the model (obtaining the b1, b2, ..., bk).

stdp calculates the standard error of the prediction after any estimation command. Here the prediction is understood to mean the same thing as the "index", namely, xb. The statistic produced by stdp can be thought of as the standard error of the predicted expected value, or mean index, for the observation's covariate pattern. This is commonly referred to as the standard error of the fitted value.

stdf calculates the standard error of the forecast, which is the standard error of the point prediction for 1 observation. It is commonly referred to as the standard error of the future or forecast value. By construction, the standard errors produced by stdf are always larger than those produced by stdp; see Methods and formulas in [R] predict.

stdr calculates the standard error of the residuals.

hat (or leverage) calculates the diagonal elements of the projection hat matrix.

cooksd calculates the Cook's D influence statistic.

residuals calculates the residuals.

rstandard calculates the standardized residuals.

rstudent calculates the Studentized (jackknifed) residuals.

nooffset may be combined with most statistics and specifies that the calculation be made, ignoring any offset or exposure variable specified when the model was fit.

This option is available, even if not documented, for predict after a specific command. If neither the offset(varname) option nor the exposure(varname) option was specified when the model was fit, specifying nooffset does nothing.

nolabel prevents _predict from labeling the newly created variable.

stddp is allowed only after you have previously fit a multiple-equation model. The standard error of the difference in linear predictions between equations 1 and 2 is calculated. Use the equation() option to get the standard error of the difference between other equations.

equation(eqno[,eqno]) is relevant only when you have previously fit a multiple-equation model. It specifies the equation to which you are referring.

equation() is typically filled in with one eqno -- it would be filled in that way with options xb and stdp, for instance. equation(#1) would mean that the calculation is to be made for the first equation, equation(#2) would mean the second, and so on. You could also refer to the equations by their names: equation(income) would refer to the equation name income and equation(hours) to the equation named hours.

If you do not specify equation(), the results are the same as if you specified equation(#1).

Other statistics refer to between-equation concepts; stddp is an example. In those cases, you might specify equation(#1,#2) or equation(income,hours). When two equations must be specified, equation() is required.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index