|
Note: This FAQ is for Stata 9 and older versions of Stata.
What are the differences between predict and adjust?
| Title |
|
Predict and adjust |
| Author |
Brian P. Poi, StataCorp |
| Date |
September 2002 |
Many people have written to the technical staff asking about the differences
between
predict and
adjust.
In this FAQ, I present a simple example using the auto dataset. This is by
no means a substitute for the Reference Manual
entries for either adjust or
predict. Presumably, you have already read those.
If not, that would be a good idea.
To begin, let’s load the auto.dta
dataset and regress mpg against
weight, length,
and foreign:
. sysuse auto
(1978 Automobile Data)
. regress mpg weight length foreign
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 3, 70) = 48.10
Model | 1645.2889 3 548.429632 Prob > F = 0.0000
Residual | 798.170563 70 11.4024366 R-squared = 0.6733
-------------+------------------------------ Adj R-squared = 0.6593
Total | 2443.45946 73 33.4720474 Root MSE = 3.3767
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0043656 .0016014 -2.73 0.008 -.0075595 -.0011718
length | -.0827432 .0547942 -1.51 0.136 -.1920267 .0265403
foreign | -1.707904 1.06711 -1.60 0.114 -3.836188 .4203806
_cons | 50.53701 6.245835 8.09 0.000 38.08009 62.99394
------------------------------------------------------------------------------
Next compute the linear prediction of the dependent variable and summarize
it by rep78:
. predict yhat, xb
. tabstat yhat, statistics(mean) by(rep78)
Summary for variables: yhat
by categories of: rep78 (Repair Record 1978)
rep78 | mean
---------+----------
1 | 21.36511
2 | 19.39887
3 | 19.91184
4 | 21.86001
5 | 24.91809
---------+----------
Total | 21.20081
--------------------
Compare this with what we obtain if we use the
adjust command:
. adjust, by(rep78)
----------------------------------------------------------------------------
Dependent variable: mpg Command: regress
Variables left as is: weight, length, foreign
----------------------------------------------------------------------------
----------------------
Repair |
Record |
1978 | xb
----------+-----------
1 | 21.3651
2 | 19.3989
3 | 19.9118
4 | 21.86
5 | 24.9181
----------------------
Key: xb = Linear Prediction
The results are the same! When you use the
adjust command without specifying any variables, it
simply summarizes the linear predictions of the regression by
rep78. Suppose that instead I typed
. adjust foreign, by(rep78)
----------------------------------------------------------------------------
Dependent variable: mpg Command: regress
Variables left as is: weight, length
Covariate set to mean: foreign = .30434781
----------------------------------------------------------------------------
----------------------
Repair |
Record |
1978 | xb
----------+-----------
1 | 20.8453
2 | 18.8791
3 | 19.5628
4 | 22.1942
5 | 25.7957
----------------------
Key: xb = Linear Prediction
The key to understanding what happened here are the two lines at the top of
the output:
Variables left as is: weight, length
Covariate set to mean: foreign = .30434781
For two of the independent variables in our regression,
weight and
length, adjust
did nothing; it left them as is. However, in computing the linear
prediction of mpg, adjust did not use the actual
values of foreign that are in the dataset.
Instead, it computed the prediction, pretending that the value of
foreign was 0.30434781 for every observation in
the dataset. Some people would argue that evaluating the equation with
foreign equal to 0.304 is nonsense because
foreign is a dummy variable that takes only
the values 0 or 1; either the car is foreign, or it is domestic. On the
other hand, one could interpret the results with
foreign equal to 0.304 as pertaining to a
car that contains 70% domestic parts and 30% foreign parts. Whether to
force a dummy variable to remain 0 or 1 when forming predictions depends
entirely on the context of the model.
The real power of adjust is in being able to create
predictions assuming certain values for some of the independent variables.
Suppose I wanted to know the average predicted fuel economy of cars by
rep78 under the assumption that all cars
are domestic. With adjust, this is easy to do:
. adjust foreign=0, by(rep78)
----------------------------------------------------------------------------
Dependent variable: mpg Command: regress
Variables left as is: weight, length
Covariate set to value: foreign = 0
----------------------------------------------------------------------------
----------------------
Repair |
Record |
1978 | xb
----------+-----------
1 | 21.3651
2 | 19.3989
3 | 20.0826
4 | 22.714
5 | 26.3155
----------------------
Key: xb = Linear Prediction
Of course, you can specify more than one variable with
adjust, and you can have some variables set to
values you specify and other variables set to their means. For example, now
I want to know the average fuel economy by
rep78 under the assumptions that all cars
are domestic and all cars are of the same (average)
length. I have no idea what the average
length of the cars is, so I will let
adjust figure it out:
. adjust foreign=0 length, by(rep78)
----------------------------------------------------------------------------
Dependent variable: mpg Command: regress
Variable left as is: weight
Covariate set to mean: length = 188.28986
Covariate set to value: foreign = 0
----------------------------------------------------------------------------
----------------------
Repair |
Record |
1978 | xb
----------+-----------
1 | 21.4239
2 | 20.3161
3 | 20.5551
4 | 22.428
5 | 24.8172
----------------------
Key: xb = Linear Prediction
As the top of the output shows, adjust set
length equal to its mean value of
188.28986, and it set foreign equal to 0 as
we requested. Because we asked for the results to be tabulated based on
rep78, the mean of
length was computed using only the 69
observations for which rep78 is not
missing. The 5 observations with a missing
rep78 are completely ignored by
adjust, even though they were used in the original
regression. In fact, adjust is really just a
front end for predict, and it is helpful to work
through the mechanics of an example to illustrate this. The previous table
of results could have been obtained in the following manner:
. preserve
. summarize length if rep78<., meanonly
. replace length=r(mean)
length was int now float
(74 real changes made)
. replace foreign=0
(22 real changes made)
. predict yhat2, xb
. tabstat yhat2, statistics(mean) by(rep78)
Summary for variables: yhat2
by categories of: rep78 (Repair Record 1978)
rep78 | mean
---------+----------
1 | 21.42387
2 | 20.31609
3 | 20.55511
4 | 22.42796
5 | 24.81715
---------+----------
Total | 21.7206
--------------------
. restore
The advantage of adjust is that we do not have to
preserve our data, summarize and replace it, and then call
tabstat
ourselves.
|