Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: linear vs log-linear regression: specification test

From   [email protected]
To   [email protected]
Subject   st: linear vs log-linear regression: specification test
Date   Tue, 28 Oct 2003 21:28:26 -0500

I hadn't heard of the  Bera-McAleer test to examine whether Y or logY
should be used
as the dependent variable in a regression analysis or of the  PE test of
functional form and don't know if Stata implements these.

However, this seems to be a general issue -- log transform or not log
transform the Y variable -- and I would be interested in hearing any
StataList views on this.  I have heard some (not Stata Listers) say that
if you get a better R2 with the transformed data or a better picture of
a regression plot (whatever that means), you should do it, but I am not
sure that I agree.     It seems that logging the Y values will always
bring high Y values in closer to the regression curve and automatically
result in a better looking picture with fewer "outliers".   Might this
tend to obscure problems and perhaps imply a functional relationship
that isn't there?    I am aware of a "Lack of Fit" test which is
available in JMP and (maybe) also in Stata  (findit -lack of fit-) , but
I haven't tested it or compared the results. I understand that it
requires that replicate measurements be available.   In this test,   SSE
is decomposed into SS due to Pure Error and SS due to Lack of Fit.  The
test is very well described in  "Applied Linear Statistical Models" by
Neter, Kutner, Nachtsheim, and Wasserman (1996) on page 115-124.   If a
log transformation "fits" and the untransformed data don't, would this
suggest that a log transformation should be performed?  The
MacKinnon-White–Davidson test is another one that I am aware of (see,
e.g.,  p. 265, Basic Econometrics by Damodar N. Gujarati (1995)).
This test seems (at least as presented by Gujarati and interpreted by
me) to directly address this issue of whether to log transform the Y
variable or not.    Are these tests something that would be useful in
determining whether a log transformation of the Y variable should be
done (maybe supplemented by plots of residuals)?  Are there others or
any general guidelines or rules?

Finally, what role should mechanistic theory/plausibility/functional
relationship (if available)  play in deciding which
mathematical/regression  relationship should be used (my feeling is a
major one)?     Drawing in part from an example in the Stata manual :
if I were trying to establish  on a relationship between, say, auto fuel
consumption in gallons per mile and weight, I would probably decide to
use a direct linear relationship based on the physics  and my underlying
notions of what the relationship between energy requirements  and mass
moved  regardless of how good a logged gallons per mile vs. weight  plot
looked or how much better its R2 value might be.  To me, taking a log of
the gallons per mile and regressing this on the weight of the automobile
would imply a vastly different mathematical relationship between the
dependent and independent variables.    If there is no underlying
specific or rigorous theory underlying the relationship and the purpose
is only  prediction,  is it then ok to simply empirically fit the data
and take the logs if this seems to result in  a better fit?

These questions may not be appropriate for the Stata List as they don't
pertain directly to Stata. But since it relates to a question posed and
seems to be of general interest, I thought I would send it.

Any thoughts or responses would be appreciated.



P.S.  Some Statalisters are able to "connect" their responses to a
specific question when they send them in with the result that they
appear continguously as part of a consistent theme (questions,
responses, more responses, additonal questions, etc.).  Does anybody
know how to do this.  All I know how to do is to addres it to the
statalist address ([email protected]) and copy the subject
line from the original question.  Unfortunately, its gets placed
chronologically and not thematically.  I looked for information on how
to do this on the Stata web site, but was not able to find any.

David Miller
703-305-5352 (voice)
703 605-1289 (fax)
OPP/Health Effects Division

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index