Home  /  Resources & support  /  FAQs  /  Predict and adjust
Note: This FAQ is for Stata 9 and older versions of Stata.

What are the differences between predict and adjust?

Title   Predict and adjust
Author Brian P. Poi, StataCorp

Many people have written to the technical staff asking about the differences between predict and adjust. In this FAQ, I present a simple example using the auto dataset. This is by no means a substitute for the Reference Manual entries for either adjust or predict. Presumably, you have already read those. If not, that would be a good idea.

To begin, let’s load the auto.dta dataset and regress mpg against weight, length, and foreign:

. sysuse auto
(1978 Automobile Data)
    
. regress mpg weight length foreign

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  3,    70) =   48.10
       Model |   1645.2889     3  548.429632           Prob > F      =  0.0000
    Residual |  798.170563    70  11.4024366           R-squared     =  0.6733
-------------+------------------------------           Adj R-squared =  0.6593
       Total |  2443.45946    73  33.4720474           Root MSE      =  3.3767

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0043656   .0016014    -2.73   0.008    -.0075595   -.0011718
      length |  -.0827432   .0547942    -1.51   0.136    -.1920267    .0265403
     foreign |  -1.707904    1.06711    -1.60   0.114    -3.836188    .4203806
       _cons |   50.53701   6.245835     8.09   0.000     38.08009    62.99394
------------------------------------------------------------------------------

Next compute the linear prediction of the dependent variable and summarize it by rep78:

. predict yhat, xb
        
. tabstat yhat, statistics(mean) by(rep78)
        
Summary for variables: yhat
     by categories of: rep78 (Repair Record 1978)

   rep78 |      mean
---------+----------
       1 |  21.36511
       2 |  19.39887
       3 |  19.91184
       4 |  21.86001
       5 |  24.91809
---------+----------
   Total |  21.20081
--------------------

Compare this with what we obtain if we use the adjust command:

. adjust, by(rep78)

----------------------------------------------------------------------------
     Dependent variable: mpg     Command: regress
   Variables left as is: weight, length, foreign
----------------------------------------------------------------------------
    
----------------------
Repair    |
Record    |
1978      |         xb
----------+-----------
        1 |    21.3651
        2 |    19.3989
        3 |    19.9118
        4 |      21.86
        5 |    24.9181
----------------------
     Key:  xb  =  Linear Prediction

The results are the same! When you use the adjust command without specifying any variables, it simply summarizes the linear predictions of the regression by rep78. Suppose that instead I typed

    . adjust foreign, by(rep78)
        
    ----------------------------------------------------------------------------
         Dependent variable: mpg     Command: regress
       Variables left as is: weight, length
      Covariate set to mean: foreign = .30434781
    ----------------------------------------------------------------------------
    
    ----------------------
    Repair    |
    Record    |
    1978      |         xb
    ----------+-----------
            1 |    20.8453
            2 |    18.8791
            3 |    19.5628
            4 |    22.1942
            5 |    25.7957
    ----------------------
         Key:  xb  =  Linear Prediction
The key to understanding what happened here are the two lines at the top of the output:
       Variables left as is: weight, length
      Covariate set to mean: foreign = .30434781

For two of the independent variables in our regression, weight and length, adjust did nothing; it left them as is. However, in computing the linear prediction of mpg, adjust did not use the actual values of foreign that are in the dataset. Instead, it computed the prediction, pretending that the value of foreign was 0.30434781 for every observation in the dataset. Some people would argue that evaluating the equation with foreign equal to 0.304 is nonsense because foreign is a dummy variable that takes only the values 0 or 1; either the car is foreign, or it is domestic. On the other hand, one could interpret the results with foreign equal to 0.304 as pertaining to a car that contains 70% domestic parts and 30% foreign parts. Whether to force a dummy variable to remain 0 or 1 when forming predictions depends entirely on the context of the model.

The real power of adjust is in being able to create predictions assuming certain values for some of the independent variables. Suppose I wanted to know the average predicted fuel economy of cars by rep78 under the assumption that all cars are domestic. With adjust, this is easy to do:

. adjust foreign=0, by(rep78)
    
----------------------------------------------------------------------------
     Dependent variable: mpg     Command: regress
   Variables left as is: weight, length
 Covariate set to value: foreign = 0
----------------------------------------------------------------------------

----------------------
Repair    |
Record    |
1978      |         xb
----------+-----------
        1 |    21.3651
        2 |    19.3989
        3 |    20.0826
        4 |     22.714
        5 |    26.3155
----------------------
     Key:  xb  =  Linear Prediction

Of course, you can specify more than one variable with adjust, and you can have some variables set to values you specify and other variables set to their means. For example, now I want to know the average fuel economy by rep78 under the assumptions that all cars are domestic and all cars are of the same (average) length. I have no idea what the average length of the cars is, so I will let adjust figure it out:

. adjust foreign=0 length, by(rep78)

----------------------------------------------------------------------------
     Dependent variable: mpg     Command: regress
    Variable left as is: weight
  Covariate set to mean: length = 188.28986
 Covariate set to value: foreign = 0
----------------------------------------------------------------------------

----------------------
Repair    |
Record    |
1978      |         xb
----------+-----------
        1 |    21.4239
        2 |    20.3161
        3 |    20.5551
        4 |     22.428
        5 |    24.8172
----------------------
     Key:  xb  =  Linear Prediction

As the top of the output shows, adjust set length equal to its mean value of 188.28986, and it set foreign equal to 0 as we requested. Because we asked for the results to be tabulated based on rep78, the mean of length was computed using only the 69 observations for which rep78 is not missing. The 5 observations with a missing rep78 are completely ignored by adjust, even though they were used in the original regression.

In fact, adjust is really just a front end for predict, and it is helpful to work through the mechanics of an example to illustrate this. The previous table of results could have been obtained in the following manner:

. preserve
    
. summarize length if rep78<., meanonly
    
. replace length=r(mean)
length was int now float
(74 real changes made)
    
. replace foreign=0
(22 real changes made)
    
. predict yhat2, xb

. tabstat yhat2, statistics(mean) by(rep78)
    
Summary for variables: yhat2
     by categories of: rep78 (Repair Record 1978)

   rep78 |      mean
---------+----------
       1 |  21.42387
       2 |  20.31609
       3 |  20.55511
       4 |  22.42796
       5 |  24.81715
---------+----------
   Total |   21.7206
--------------------

. restore

The advantage of adjust is that we do not have to preserve our data, summarize and replace it, and then call tabstat ourselves.