Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Antwort: Re: st: Non-linear regression: interpretation

 From Justina Fischer To statalist@hsphsun2.harvard.edu Subject Antwort: Re: st: Non-linear regression: interpretation Date Wed, 9 Feb 2011 10:37:00 +0100

Hi Daniel and Dave

I suggest to look at things in a more abstract way.

In a linear model (Y = ax), we estimate a.
The coefficient a equals its marginal effect, that is dy/dx = a.
a indicates then how much y changes when we increase x by a small increment.

Now we estimate the non-linear model (with OLS): y = ax + bx2.
x2 is the squared term (x*x). We estimate coefficients a and b.

How much is the (total/compound) marginal effect of x? dy/dx = a + 2bx .
As one easily sees, the marginal effect dy/dx changes in x; dy/dx is not a constant any more.

So now what happens if we neglect the squared term coefficient (b) in interpreting ?  This would be like assuming that the non-linear function was linear, which is wrong. This understates the true (total) marginal effect (a in place of a + 2bx).

On the other hand, interpreting only b neglects the 'linear' part of the function, and its contribution to the derivative dy/dx. And looking at b itself even understates the true contribution of x2 to the (total) marginal effect, which is 2bx.

In a sense, both coefficients a and b have to be interpreted jointly; separate they make little sense.

Hope this helps
Justina

-----owner-statalist@hsphsun2.harvard.edu schrieb: -----

An: statalist@hsphsun2.harvard.edu
Von: Daniel Feenberg <feenberg@nber.org>
Gesendet von: owner-statalist@hsphsun2.harvard.edu
Datum: 09.02.2011 12:56AM
Thema: Re: st: Non-linear regression: interpretation

On Tue, 8 Feb 2011, David Greenberg wrote:

> It is true that the quadratic term taken by itself can be hard to
> interpret. If the linear term is also in the equation, the coefficient
> for the quadratic term would seem to be an answer to a question that
> cannot have a meaningful answer, namely, how much the dependent variable
> changes in response to marginal change in the quadratic term, while
> holding the linear term constant. But it is impossible to hold x
> constant and allow x-squared to vary. However, the estimated
> coefficients of linear and quadratic terms together can be used to
> compute the estimated point at which the quadratic equation has a
> minimum or maximum, and that is something many researchers might want to
> know. One can also compute the value of the dependent variable at the
> minimum or maximum. David Greenberg, Sociology Department, New York
> University

If one takes the squared term about the mean of the variable, it
contributies nothing at the mean, leaving the linear term alone describing
the effect of changes in the variable about the mean. That can make quick
interpretations of the coeficients possible. For example, if the mean of x
is 7, then define

xx = (x-7)**2
instead of using x**2. This won't change any predictions or t-stats, but
the slope dy/dx at x=7 will just be the coefficent on the linear term for
x - no need to fuss with calculating the contribution of the squared
term.

Daniel Feenberg

>
> ----- Original Message -----
> From: Maarten buis <maartenbuis@yahoo.co.uk>
> Date: Tuesday, February 8, 2011 4:55 am
> Subject: Re: st: Non-linear regression
> To: statalist@hsphsun2.harvard.edu
>
>
>> --- On Tue, 8/2/11, Hamizah Hassan wrote:
>>> I would like to run non-linear regression by including the
>>> linear and quadratic functions of the variable.
>>
>> Typically this is still refered to as a linear model, as the
>> model is still linear in the parameters.
>>
>>> I just realize that if the variable is in percentage, the
>>> quadratic figure is higher than the linear figure. However,
>>> if it is in decimal, it would be the other way around and
>>> definitely it will effect on the meaning of the results.
>>
>> The models are mathematically equivalent. You can see that
>> by looking at the predictions.
>>
>> Generally, it is hard to give a substantive interpretation to
>> a quadritic term, regardless of how you scaled the original
>> variable. If you care about interpreting the coefficients but
>> still want to allow for non-linear effects, then your best
>> guess is probably to use linear splines (which confusingly is
>> actually a non-linear function...)
>>
>> Consider the example below. The first part shows that the
>> two quadratic models result in the same predicted values. The
>> final part displays linear splines as an alternative. The final
>> graph shows that they result in fairly similar predictions, but
>> the spline terms can actually be interpreted: the parameter for
>> fuel_cons1 tells you that for cars with a fuel-consumption of
>> non-significant price increase of 62\$ (=.062*1000\$). The
>> parameter for fuel_cons2 tells you that for cars with a fuel
>> consumption of more than 12 liters/100km an additional liter
>> per 100 kilometers will lead to a signinicant price increase of
>> 1011\$ (=1.011*1000\$).
>>
>> *----------------- begin example -----------------
>> //================================== first part
>> sysuse auto, clear
>>
>> // since I am European and the question is about
>> // interpretation I first convert mpg from miles
>> // per gallon to liter / 100 km and price in
>> // 1000 \$
>>
>> gen fuel_cons = 1/mpg * 3.78541178 / 1.609344 *100
>> label var fuel_cons "fuel consumption (l/100km)"
>>
>> replace price = price / 1000
>> label var price "price (1000\$)"
>>
>> // create a "proportion-like" variable
>> sum fuel_cons , meanonly
>> gen prop = ( fuel_cons - r(min) ) / ( r(max) - r(min) )
>>
>> // take a look at that new variable
>> spikeplot prop, ylab(0 1 2)

>>
>> // turn it into percentages
>> gen perc = prop*100
>> spikeplot perc, ylab(0 1 2)
>>
>> // add square terms using the new
>> // factor variable notation
>> reg price c.prop##c.prop
>> predict yhat_prop
>>
>> reg price c.perc##c.perc
>> predict yhat_perc
>>
>> // compare predicted values
>> twoway function identity = x,        ///
>>        range( 13 31 ) lcolor(gs8) || ///
>>        scatter yhat_prop yhat_perc,  ///
>>            aspect(1) msymbol(Oh)
>>
>> //================================== final part
>> // alternative with interpretable parameters
>>
>> // create splines
>> mkspline fuel_cons1 12 fuel_cons2 = fuel_cons
>>
>> reg price fuel_cons1 fuel_cons2
>> predict yhat_spline
>>
>> twoway scatter price fuel_cons  ||           ///
>>        line yhat_prop yhat_spline fuel_cons, ///
>>        sort ytitle("price (1000 {c S|})")    ///
>>        legend(order( 1 "observations"        ///
>>                      2 "prediction,"         ///
>>                      3 "prediction,"         ///
>>                        "spline" ))
>> *---------------- end example --------------
>> (For more on examples I sent to the Statalist see:
>> http://www.maartenbuis.nl/example_faq )
>>
>> Hope this helps,
>> Maarten
>>
>> --------------------------
>> Maarten L. Buis
>> Institut fuer Soziologie
>> Universitaet Tuebingen
>> Wilhelmstrasse 36
>> 72074 Tuebingen
>> Germany
>>
>> http://www.maartenbuis.nl
>> --------------------------
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/