Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: about residuals and coefficients


From   David Hoaglin <[email protected]>
To   [email protected]
Subject   Re: Re: st: about residuals and coefficients
Date   Thu, 5 Sep 2013 23:48:41 -0400

Yuval,

Part of your comment illustrates the practice that I am criticizing.
In general, regression analysis, desirable or actual, estimates the
effect of each predictor after adjusting for (not "controlling for")
the contributions of the other predictors.  One does not have equal
conditions or ceteris paribus unless the collection of the data was
designed to produce such structure.

For sophisticated users of regression analysis, the distinction
between "adjusting for" and "controlling for" may be largely semantic.
 For less-sophisticated users or consumers of the results, language
such as "controlling for" gives the misleading impression that
something is being held constant.  For observational data, that is
usually an overstatement.

Many patterns of correlation among predictors are not substantial
enough to qualify as "collinearity."

I am not familiar with the example of repair expenditures on a Toyota
car, but the negative coefficient on one of the predictors is
implausible only if one tries to interpret it in the same way as the
coefficient in the corresponding simple regression.  In the model that
uses both mileage and age as predictors, the coefficient of age
summarizes the change in repair expenditures per unit increase in age
after adjusting for simultaneous linear change in mileage.  For a
more-detailed understanding, one would have to look at the structure
of the data (e.g., cross-sectional or longitudinal, the particular
cars involved).  If the two-predictor model is not an appreciably
better fit than the one-predictor models, it would be appropriate to
remove one of the predictors.

David Hoaglin

On Thu, Sep 5, 2013 at 5:00 PM, Yuval Arbel <[email protected]> wrote:
> David,
>
> I believe there are two levels in the regression analysis: 1) what is
> desirable; 2) what is possible to achieve.
>
> In terms of desirability, the objective of the regression analysis is
> to isolate the effect of each covariate after controlling other
> factors (what we call "under equal conditions" or "ceteris paribus")
>
> In terms of actual possibility - the degree of success depends (among
> other things) on the degree of collinearity.
>
> High and low collinearity are dealt with in each and every Econometric
> textbook that I am familiar with.
>
> Moreover, the example of repair expenditures on Toyota car as a linear
> function of mileage and age of the car is very well known: it yield
> negative coefficient on one of the explanatory variable (implying the
> implausible outcome that as the age of the car goes up, the repair
> expenditures goes down. This problem is resolved when one of these
> variables are omitted.
>
> In term of correct practice - if you get implausible outcome - the
> first thing you should eliminate - is high collinearity.
>
> At least the textbooks I know reflect this insight.
>
> P.S. There is a possible methodology to remedy collinearity called ORR
> - and I believe it also exists in Stata. Economists don't like this
> methodology very much - because you enter a bias into the model, in
> order to decrease collinearity
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index