[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Interaction terms interpretation when one variable is omitted
David Hoaglin <email@example.com>
Re: st: Interaction terms interpretation when one variable is omitted
Sun, 21 Apr 2013 09:27:47 -0400
Apologies for the delay in replying. Again, thanks for the further
explanation. I'll focus on your new questions.
The option -r- is not defined for predict after xtreg. I'm not sure
what you would like to predict. Perhaps one of the available options
will meet your need. Look at the documentation for -xtreg- and then
Since treat_status as a whole does not make a significant contribution
(after adjusting for the other variables), my preference would be not
to interpret the coefficients as different from zero. I'm not
entirely opposed to discussing their size and direction, as a
descriptive summary, but I would be careful not to overinterpret. If
your sample were small, you might wonder whether some categories of
treat_status would be significant in a large sample, but I think you
already have a sizable sample. The lack of significance may be
disappointing, but that seems to be the finding,
The fixed effects account for all potential confounding variables
(both observed and unobserved) that remain stable over the time period
of the data and whose effects are assumed to be constant over time.
In a random-effects model, the random intercept is assumed to be
uncorrelated with the covariates that are included in the model. In
your data the fixed effects seem to be accounting for more variation.
It is not a problem for -xtreg- if the number of observations varies
among individuals. If relatively few individuals have more than one
observation, however, the panel nature of the data will be weak.
The difference between logs base 10 and natural logs is only a matter
of scaling: all logarithms differ only by a multiplicative constant.
Logs base 10 are easier to interpret: log(100) = 2, log(1,000,000) =
On Tue, Apr 16, 2013 at 3:05 PM, Mirnezami, Oliver
> Dear David
> Thank you ever so much for the detailed reply.
> Yes I completely agree that the 'disabled category' has too few observations and so I have combined them with the 'not in labour force' category.
> Apologies for any confusion regarding the treat_status variable. I can confirm that the categories are indeed mutually exclusive and exhaustive. The data has been restricted so that everyone in the regressions are either in the control group (employed) or has lost their job between the previous and current wave and so falls into one of the several treatment categories e.g. treat_emp, treat_unemp, treat_ret etc. [Just to clarify that treat_emp means that individual lost their job at some point between the previous and current wave but has found employment in the current wave whereas those in treat_unemp have not found employment in the current wave following job loss. ]
> I do include a female dummy variable along with a selection of other variables including ethnicity, education, wealth, smoking, bmi and a series of industry and firm size dummies. I just hadn't shown these for simplicity here. Married is a dummy variable indicating whether an individual is married or not in the current period and has got large enough frequencies. I will investigate interaction terms as well as you suggested. Regarding age, since the dataset is the US Health and Retirement Study, the average age of individuals in my regression is about 55. I'm going to look into what you suggested here regarding the functional form - thank you in particular for this tip! When I tried initially running my panel regression (left out some variables here again for simplicity), I get an error:
> xtreg health treat_all ln_income female y1996 y1998 y2000 y2002 , fe vce(cluster id)
> predict res, r
> option r not allowed
> It seems to work if I use reg rather than xtreg and remove the fe option but I need to use xtreg fe.
> Regarding the comparison of coefficients on treat_status, after combining the disabled category with 'not in labour force' and re-running the regression with i.treat_status, all of the treat_emp, treat_unemp, treat_ret and treat_nlbrf coefficients are insignificant. I did an F test for joint significance and got a p-value of 0.85. I also tried just treat_all (all categories combined) on its own and this had an individual p-value of 0.37. I see what you mean when you say that the health of the people who are not employed does not differ from the health of the people who are employed after adjusting for the contributions of the various explanatory variables due to the p-values. However, can I make any comments on the relative size of the coefficients, even if they are not significant or significantly different from each other e.g. using the output below, could I say that the effect of job loss on health appears stronger for those that are unemployed compared to those that ga!
> re-employment (-0.095 vs -0.023) although the effect is statistically insignificant? Or that due to the negative sign for all treatment categories, it appears that treatment has a negative effect on health compared to the control group although the effect is statistically insignificant? It's just that this makes intuitive sense. Or is it simply not worth commenting on? You also suggested assessing the contribution of treat_status as a whole by running the regression without it and comparing the two models - I did this and the coefficients were pretty much exactly the same so treat_status doesn't seem to alter the model much at all.
> treat_emp | -.0229748 .0380209 -0.60 0.546 -.0975028 .0515533
> treat_unemp | -.094682 .1218764 -0.78 0.437 -.3335827 .1442186
> treat_ret | -.0540969 .1030903 -0.52 0.600 -.2561732 .1479795
> treat_nlbrf | -.0968564 .1647246 -0.59 0.557 -.4197475 .2260347
> Another query I have is when I run the model with fixed effects as I showed you previously, many of the explanatory variables are insignificant although with random effects more are significant and sometimes the signs change as well. Is it just the case that the fixed effects are capturing most of the variation which is why the explanatory variables appear insignificant?
> Another concern is to do with the fact that the regression samples typically contain multiple observations over time for each individual although the actual number varies so some individuals may only appear once whereas others several times. Is this an issue and do you have any advice on how it can be resolved? I wondered if I should include some kind of weighting to account for this but am not sure of the theory behind it or how to do so in Stata?
> Could you please elaborate on why logs base 10 are a more useful choice than natural logs? I logged income in order to get a distribution that appeared more like a normal distribution as the data was initially skewed when looking at a histogram.
* For searches and help try: