Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# RE: st: RE: Interpretation of quadratic terms

 From Rodolphe Desbordes To "statalist@hsphsun2.harvard.edu" Subject RE: st: RE: Interpretation of quadratic terms Date Thu, 11 Mar 2010 13:59:21 +0000

```Dear Michael,

If the age variable corresponds to the current age of an individual in 1988, year of birth should be defined as year=1988-age.

I do not have Stata 11 installed on this computer and I am not vary familiar with the -margins- command. However, using -margins-, you did not calculate the marginal effect but the predicted value of wage, at a given value of age.

Using Stata 10:

.  sysuse nlsw88, clear
(NLSW, 1988 extract)

.
.  generate year = 1988-age

. generate year2=year^2

. generate age2=age^2

.
. reg wage year year2

Source |       SS       df       MS              Number of obs =    2246
-------------+------------------------------           F(  2,  2243) =    1.72
Model |  114.042127     2  57.0210637           Prob > F      =  0.1789
Residual |  74253.9253  2243  33.1047371           R-squared     =  0.0015
Total |  74367.9674  2245  33.1260434           Root MSE      =  5.7537

------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |   38.15985   53.51605     0.71   0.476     -66.7863     143.106
year2 |  -.0097745   .0137323    -0.71   0.477    -.0367039     .017155
_cons |  -37236.44      52139    -0.71   0.475    -139482.2     65009.3
------------------------------------------------------------------------------

. lincom year+2*year2*1948

( 1)  year + 3896 year2 = 0

------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |   .0785881   .0423687     1.85   0.064     -.004498    .1616741
------------------------------------------------------------------------------

.

-----------------------------------------------------------------------------------------------------------------------
Dependent variable: wage     Command: regress
Covariates set to value: year = 1948, year2 = 3794704
-----------------------------------------------------------------------------------------------------------------------

----------------------------------------------
All |         xb          lb          ub
----------+-----------------------------------
|    7.79889    [7.44975    8.14803]
----------------------------------------------
Key:  xb         =  Linear Prediction
[lb , ub]  =  [95% Confidence Interval]

.
.
. reg wage age age2

Source |       SS       df       MS              Number of obs =    2246
-------------+------------------------------           F(  2,  2243) =    1.72
Model |  114.042127     2  57.0210637           Prob > F      =  0.1789
Residual |  74253.9253  2243  33.1047371           R-squared     =  0.0015
Total |  74367.9674  2245  33.1260434           Root MSE      =  5.7537

------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |   .7033681   1.084471     0.65   0.517    -1.423304     2.83004
age2 |  -.0097745   .0137323    -0.71   0.477    -.0367039     .017155
_cons |  -4.696709   21.30931    -0.22   0.826    -46.48474    37.09133
------------------------------------------------------------------------------

.
. lincom age+2*age2*40

( 1)  age + 80 age2 = 0

------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |  -.0785881   .0423687    -1.85   0.064    -.1616741     .004498
------------------------------------------------------------------------------

.

-----------------------------------------------------------------------------------------------------------------------
Dependent variable: wage     Command: regress
Covariates set to value: age = 40, age2 = 1600
-----------------------------------------------------------------------------------------------------------------------

----------------------------------------------
All |         xb          lb          ub
----------+-----------------------------------
|    7.79889    [7.44976    8.14802]
----------------------------------------------
Key:  xb         =  Linear Prediction
[lb , ub]  =  [95% Confidence Interval]

.
end of do-file

Rodolphe

________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael Mitchell [Michael.Norman.Mitchell@gmail.com]
Sent: 11 March 2010 07:19
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: Interpretation of quadratic terms

Dear Nick, Rodolphe, Rosie, and everyone else...

I happened to come across an example today that illustrates Nick's
point, that sometimes centering can be needed and sometimes not.

I have reproduced this using the -nlsw88.dta- data file.

First, suppose we had a variable -year-, the year the person was
born. I create that as 1968 plus the age of the person. Then I want to
predict -wage- from -year- and -year- squared.

. sysuse nlsw88, clear
(NLSW, 1988 extract)

. generate year = age + 1968

.
. regress wage c.year##c.year

Source |       SS       df       MS              Number of obs =    2246
-------------+------------------------------           F(  2,  2243) =    1.72
Model |  114.042127     2  57.0210637           Prob > F      =  0.1789
Residual |  74253.9253  2243  33.1047371           R-squared     =  0.0015
Total |  74367.9674  2245  33.1260434           Root MSE      =  5.7537

------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |   39.17561   55.13424     0.71   0.477    -68.94386    147.2951
|
c.year#|
c.year |  -.0097745   .0137323    -0.71   0.477    -.0367039     .017155
|
_cons |  -39245.61    55339.8    -0.71   0.478    -147768.2    69276.95
------------------------------------------------------------------------------

The -vif- command shows **very** large VIF values.

. vif

Variable |       VIF       1/VIF
-------------+----------------------
year |  1.93e+06    0.000001
c.year#|
c.year |  1.93e+06    0.000001
-------------+----------------------
Mean VIF |  1.93e+06

But, even worse, the margins command will not estimate the mean wages
for a year of 1970.

. margins , at(year=1970)

Adjusted predictions                              Number of obs   =       2246
Model VCE    : OLS

Expression   : Linear prediction, predict()
at           : year            =        1970

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |  (not estimable)
------------------------------------------------------------------------------

But, if I estimate this model using -age- instead of year, things work better.
.
. regress wage c.age##c.age

Source |       SS       df       MS              Number of obs =    2246
-------------+------------------------------           F(  2,  2243) =    1.72
Model |  114.042127     2  57.0210637           Prob > F      =  0.1789
Residual |  74253.9253  2243  33.1047371           R-squared     =  0.0015
Total |  74367.9674  2245  33.1260434           Root MSE      =  5.7537

------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |   .7033681   1.084471     0.65   0.517    -1.423304     2.83004
|
c.age#c.age |  -.0097745   .0137323    -0.71   0.477    -.0367039     .017155
|
_cons |  -4.696709   21.30931    -0.22   0.826    -46.48474    37.09133
------------------------------------------------------------------------------

The -vif- values are still large, but not as enormous as before.

. vif

Variable |       VIF       1/VIF
-------------+----------------------
age |    746.80    0.001339
c.age#c.age |    746.80    0.001339
-------------+----------------------
Mean VIF |    746.80

And the -margins- command can estimate the wages for someone who is 40
years old.

. margins , at(age=40)

Adjusted predictions                              Number of obs   =       2246
Model VCE    : OLS

Expression   : Linear prediction, predict()
at           : age             =          40

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |   7.798891   .1780335    43.81   0.000     7.449951     8.14783
------------------------------------------------------------------------------

So, sometimes collinearity can be high, but we can still compute
marginal effects... in other cases, the collinearity can be so high,
that even if the regression model can be estimated, it may not be
possible to estimate marginal effects. It seems to depend on the
degree of collinearity present.

Best regards,

Michael

On Tue, Mar 9, 2010 at 1:02 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I think you're both right. In olden days, pre-emptive centring, as we
> say in English, was a good idea in order to avoid numerical problems
> with mediocre programs that did not handle near multicollinearity well.
> Nowadays, decent programs including Stata take care that you get bitten
> as little as possible by such problems. If course, if you really do have
> multicollinearity, nothing much can help, except that Stata drops
> predictors and flags the issue.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Rodolphe Desbordes
>
> My point is that centering does not reduce multicollinearity. As you can
> see in my example, the standard errors of the estimated marginal effects
> at the mean of `mpg' are the same using uncentered or centered values of
> `mpg'.
>
> Rosie Chen
>
> Thanks, Rodolphe, for this helpful demonstration. Agree that the major
> purpose of centering seems to be that we make the interpretation of X
> meaningful. I guess reducing multicollinearity is a bi-product of the
> benefit.
>
> Rodolphe Desbordes <rodolphe.desbordes@strath.ac.uk>
>
> Centering will not affect your estimates and their uncertainty. However,
> centering allows you to directly obtain the estimated effect of X on Y
> for a meaningful value of X, i.e. the mean of X.
>
> . sysuse auto.dta,clear
> (1978 Automobile Data)
>
> . gen double mpg2=mpg^2
>
> . reg price mpg mpg2
>
>      Source |       SS       df       MS              Number of obs =
> 74
> -------------+------------------------------           F(  2,    71) =
> 18.28
>       Model |   215835615     2   107917807           Prob > F      =
> 0.0000
>    Residual |   419229781    71  5904644.81           R-squared     =
> 0.3399
> 0.3213
>       Total |   635065396    73  8699525.97           Root MSE      =
> 2429.9
>
> ------------------------------------------------------------------------
> ------
>       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
>         mpg |  -1265.194   289.5443    -4.37   0.000    -1842.529
> -687.8593
>        mpg2 |   21.36069   5.938885     3.60   0.001     9.518891
> 33.20249
>       _cons |   22716.48   3366.577     6.75   0.000     16003.71
> 29429.24
> ------------------------------------------------------------------------
> ------
>
> . sum mpg
>
>    Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>         mpg |        74     21.2973    5.785503         12         41
>
> . local m=r(mean)
>
> . lincom _b[mpg]+2*_b[mpg2]*`m'
>
> ( 1)  mpg + 42.59459 mpg2 = 0
>
> ------------------------------------------------------------------------
> ------
>       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
>         (1) |  -355.3442   58.86205    -6.04   0.000    -472.7118
> -237.9766
> ------------------------------------------------------------------------
> ------
>
> . gen double mpgm=mpg-`m'
>
> . gen double mpgm2=mpgm^2
>
> . reg price mpgm mpgm2
>
>      Source |       SS       df       MS              Number of obs =
> 74
> -------------+------------------------------           F(  2,    71) =
> 18.28
>       Model |   215835615     2   107917807           Prob > F      =
> 0.0000
>    Residual |   419229781    71  5904644.81           R-squared     =
> 0.3399
> 0.3213
>       Total |   635065396    73  8699525.97           Root MSE      =
> 2429.9
>
> ------------------------------------------------------------------------
> ------
>       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
>        mpgm |  -355.3442   58.86205    -6.04   0.000    -472.7118
> -237.9766
>       mpgm2 |   21.36069   5.938885     3.60   0.001     9.518891
> 33.20249
>       _cons |   5459.933   343.8718    15.88   0.000     4774.272
> 6145.594
> ------------------------------------------------------------------------
> ------
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```