Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: regression


From   austin nichols <[email protected]>
To   [email protected]
Subject   Re: st: regression
Date   Wed, 28 Dec 2005 08:57:18 -0500

The mathematical explanation is quite easy, but it's not clear to me
what is confusing you, so I'm not sure I can explain in a way to make
it more clear.  by including a dummy variable (AKA indicator variable)
like hvptb, you are allowing the constant term (AKA intercept) to
differ across the two groups defined by hvptb=0 and hvptb=1.  By
including interactions with hvptb, you are allowing the slope wrt time
and other relevant terms to differ across those two groups as well. 
If you include the interaction of hvptb with every other variable, it
is almost the same as estimating two separate models (e.g. reg y
$xvars if hvptb==0 and reg y $xvars if hvptb==1).

If you think the relationship between y [or log(crh) as you call it]
and time is nonlinear, then I guess you should be including at least a
linear and quadratic term (a la the Taylor series expansion of the
presumably unknown nonlinear function) in both of those models.  Which
means for the model using both groups (hvptb=0 and hvptb=1), you have
to include all the interactions.  The significance of any one
coefficient in such a model is nearly irrelevant, since what you care
about is whether linear combinations of coefficients are significant
(the obvious test is whether b3=0 and b4=0 and b5=0, given by -test
hvptb txhvptb t2xhvptb- or somesuch) in a model of the form:

E(y)= b0 + b1*time + b2*time^2 + b3*hvptb + b4*(time*hvptb) +b5*(time^2*hvptb)

estimated by, e.g.,

. reg y t t2 hvptb txhvptb t2xhvptb

In general, you should try thinking through the marginal effect of
each relevant variable for each relevant subgroup to interpret the
coefficients.  In your "Model-1" below, the marginal effect of time
for the hvptb=0 subgroup is b1+b2*time and the marginal effect of time
for the hvptb=1 subgroup is b1+b4+b2*time which shows you that b2 is
capturing an effect of time constrained to be the same across the two
groups.  Obviously, including a quadratic term for (almost) any
variable will result in a different estimate of the coefficient on the
linear term, and including a quadratic term that is the same for both
groups will result in different estimates of the coefficients on the
linear terms for each group than would including quadratic terms that
differ across groups.


On 12/26/05, [email protected] <[email protected]> wrote:
> Model-1: Log(crh)= b0 + b1*time + b2*time^2 + b3*hvptb + b4*(time*hvptb).
> Model-2: Log(crh)= b0 + b1*time + b2*time^2 + b3*hvinf + b4*(time*hvinf).
>
> On estimating the models I find that the values of b0 and b1 are not the same for the two models. Hence the prediction equation for the normal people are different in the two models. The quadratic time term is significant in the model.
>
> Austin Nichols wrote back to me that the coefficients will be the same only if I include all the relevant interaction terms "time^2*hvptb" and "time^2*hvinf" also in the model. I verified this but I couldn't find a mathematical explanation to why this is so.
>
> On including the interaction terms for time^2, I lose significance on those terms as well as  the other terms. So if I leave out the interaction terms I need to explain why the normal people have different equations in the two models.
> I'd appreciate any help on this issue.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index