Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: A layman question on model building

From   Maarten Buis <>
Subject   Re: st: A layman question on model building
Date   Thu, 7 Mar 2013 10:22:34 +0100

--- On 07.03.2013 08:07, James Bernard wrote:
>> We often add control variables that turn out to be insignificant. Does
>> that mean that I can remove that variable form my model without being
>> concerned with omitted variable bias?

--- On Thu, Mar 7, 2013 at 8:22 AM, John Antonakis wrote:
> If you have a sufficiently large sample size and the regressors of interest
> are significant predictors, then it is best to leave in the controls; they
> do not harm but help consistency (even if only a tad). <snip>  I would
> (mostly) always err on the side of caution and include the controls.

I agree with John if the predictor belonged in the model. However,
many predictors don't belong in the model to begin with. I often see
researcher including variables only because they think it influences
the explained/dependent/response/left-hand-side/y variable. This is a
necessary but _not_ a sufficient condition. The researcher also needs
to think about the relationship between the control variable and the
predictor of interest. This leads to two common mistakes:

Say you have a response which is influenced by someones social status
and you approximate social status with someones occupation. Should you
than control for someones education? If you think that someones
education is also a approximation of someones social status, and that
that is the mechanism through which education influences your response
than the answer should be no. If you do than you measure the impact of
one measure of social status while adjusting for another measure of
that same social status. What you could do is either chose one of the
two measures of social status or combine the two measures into a
single status measure, for instance using -sheafcoef- and -propcnsreg-
(both from SSC) or -sem-. Maybe you need to rethink your theory, it
could be that it is really just possibilities and constraints
associated with the occupation and knowledge and socialization
received from eduction that influences your response, in which case
both might (see below) belong in your model .

Say you want to know the impact of someones education on a response,
should you control for that persons occupation? In most cases I would
say no, as that way you would filter out one of the important
mechanisms through which education influences the response. In most
cases it is reasonable to assume that someones education influences
someones occupation and not the other way around. So if occupation
influences a response, than that represents a causal pathway through
which education influences that response: education influences
someones occupation which in turn influences the response. This is not
a spurious effect that you want to filter out. If anything the causal
claim for this part of the effect of education is stronger than the
residual effect you would obtain if you controlled for occupation,
since we know why that effect is there and how it works. It could be
meaningful to use -sem- in combination with -estat teffects- to
decompose the total effect of education into effects that can be
explained by intervening variables (occupation) and a
residual/unexplained effect of education

So, once I decided that a variable belongs in my model I would agree
with John and almost always leave it in, but many variables do not
belong in the model to begin with.

Hope this helps,

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index