Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Non-linear regression: interpretation

From   Daniel Feenberg <>
Subject   Re: st: Non-linear regression: interpretation
Date   Tue, 8 Feb 2011 18:56:37 -0500 (EST)

On Tue, 8 Feb 2011, David Greenberg wrote:

It is true that the quadratic term taken by itself can be hard to interpret. If the linear term is also in the equation, the coefficient for the quadratic term would seem to be an answer to a question that cannot have a meaningful answer, namely, how much the dependent variable changes in response to marginal change in the quadratic term, while holding the linear term constant. But it is impossible to hold x constant and allow x-squared to vary. However, the estimated coefficients of linear and quadratic terms together can be used to compute the estimated point at which the quadratic equation has a minimum or maximum, and that is something many researchers might want to know. One can also compute the value of the dependent variable at the minimum or maximum. David Greenberg, Sociology Department, New York University

If one takes the squared term about the mean of the variable, it contributies nothing at the mean, leaving the linear term alone describing the effect of changes in the variable about the mean. That can make quick interpretations of the coeficients possible. For example, if the mean of x is 7, then define

   xx = (x-7)**2

instead of using x**2. This won't change any predictions or t-stats, but the slope dy/dx at x=7 will just be the coefficent on the linear term for x - no need to fuss with calculating the contribution of the squared term.

Daniel Feenberg

----- Original Message -----
From: Maarten buis <>
Date: Tuesday, February 8, 2011 4:55 am
Subject: Re: st: Non-linear regression

--- On Tue, 8/2/11, Hamizah Hassan wrote:
I would like to run non-linear regression by including the
linear and quadratic functions of the variable.

Typically this is still refered to as a linear model, as the
model is still linear in the parameters.

I just realize that if the variable is in percentage, the
quadratic figure is higher than the linear figure. However,
if it is in decimal, it would be the other way around and
definitely it will effect on the meaning of the results.

The models are mathematically equivalent. You can see that
by looking at the predictions.

Generally, it is hard to give a substantive interpretation to
a quadritic term, regardless of how you scaled the original
variable. If you care about interpreting the coefficients but
still want to allow for non-linear effects, then your best
guess is probably to use linear splines (which confusingly is
actually a non-linear function...)

Consider the example below. The first part shows that the
two quadratic models result in the same predicted values. The
final part displays linear splines as an alternative. The final
graph shows that they result in fairly similar predictions, but
the spline terms can actually be interpreted: the parameter for
fuel_cons1 tells you that for cars with a fuel-consumption of
less than 12 liters/100km an additional liter/100km leads to a
non-significant price increase of 62$ (=.062*1000$). The
parameter for fuel_cons2 tells you that for cars with a fuel
consumption of more than 12 liters/100km an additional liter
per 100 kilometers will lead to a signinicant price increase of
1011$ (=1.011*1000$).

*----------------- begin example -----------------
//================================== first part
sysuse auto, clear

// since I am European and the question is about
// interpretation I first convert mpg from miles
// per gallon to liter / 100 km and price in
// 1000 $

gen fuel_cons = 1/mpg * 3.78541178 / 1.609344 *100
label var fuel_cons "fuel consumption (l/100km)"

replace price = price / 1000
label var price "price (1000$)"

// create a "proportion-like" variable
sum fuel_cons , meanonly
gen prop = ( fuel_cons - r(min) ) / ( r(max) - r(min) )

// take a look at that new variable
spikeplot prop, ylab(0 1 2)

// turn it into percentages
gen perc = prop*100
spikeplot perc, ylab(0 1 2)

// add square terms using the new
// factor variable notation
reg price c.prop##c.prop
predict yhat_prop

reg price c.perc##c.perc
predict yhat_perc

// compare predicted values
twoway function identity = x,        ///
       range( 13 31 ) lcolor(gs8) || ///
       scatter yhat_prop yhat_perc,  ///
           aspect(1) msymbol(Oh)

//================================== final part
// alternative with interpretable parameters

// create splines
mkspline fuel_cons1 12 fuel_cons2 = fuel_cons

reg price fuel_cons1 fuel_cons2
predict yhat_spline

twoway scatter price fuel_cons  ||           ///
       line yhat_prop yhat_spline fuel_cons, ///
       sort ytitle("price (1000 {c S|})")    ///
       legend(order( 1 "observations"        ///
                     2 "prediction,"         ///
                       "quadratric"          ///
                     3 "prediction,"         ///
                       "spline" ))
*---------------- end example --------------
(For more on examples I sent to the Statalist see: )

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen

*   For searches and help try:
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index