[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
[email protected] |

To |
[email protected] |

Subject |
st: linear vs log-linear regression: specification test |

Date |
Tue, 28 Oct 2003 21:28:26 -0500 |

I hadn't heard of the Bera-McAleer test to examine whether Y or logY should be used as the dependent variable in a regression analysis or of the PE test of functional form and don't know if Stata implements these. However, this seems to be a general issue -- log transform or not log transform the Y variable -- and I would be interested in hearing any StataList views on this. I have heard some (not Stata Listers) say that if you get a better R2 with the transformed data or a better picture of a regression plot (whatever that means), you should do it, but I am not sure that I agree. It seems that logging the Y values will always bring high Y values in closer to the regression curve and automatically result in a better looking picture with fewer "outliers". Might this tend to obscure problems and perhaps imply a functional relationship that isn't there? I am aware of a "Lack of Fit" test which is available in JMP and (maybe) also in Stata (findit -lack of fit-) , but I haven't tested it or compared the results. I understand that it requires that replicate measurements be available. In this test, SSE is decomposed into SS due to Pure Error and SS due to Lack of Fit. The test is very well described in "Applied Linear Statistical Models" by Neter, Kutner, Nachtsheim, and Wasserman (1996) on page 115-124. If a log transformation "fits" and the untransformed data don't, would this suggest that a log transformation should be performed? The MacKinnon-Whiteâ€“Davidson test is another one that I am aware of (see, e.g., p. 265, Basic Econometrics by Damodar N. Gujarati (1995)). This test seems (at least as presented by Gujarati and interpreted by me) to directly address this issue of whether to log transform the Y variable or not. Are these tests something that would be useful in determining whether a log transformation of the Y variable should be done (maybe supplemented by plots of residuals)? Are there others or any general guidelines or rules? Finally, what role should mechanistic theory/plausibility/functional relationship (if available) play in deciding which mathematical/regression relationship should be used (my feeling is a major one)? Drawing in part from an example in the Stata manual : if I were trying to establish on a relationship between, say, auto fuel consumption in gallons per mile and weight, I would probably decide to use a direct linear relationship based on the physics and my underlying notions of what the relationship between energy requirements and mass moved regardless of how good a logged gallons per mile vs. weight plot looked or how much better its R2 value might be. To me, taking a log of the gallons per mile and regressing this on the weight of the automobile would imply a vastly different mathematical relationship between the dependent and independent variables. If there is no underlying specific or rigorous theory underlying the relationship and the purpose is only prediction, is it then ok to simply empirically fit the data and take the logs if this seems to result in a better fit? These questions may not be appropriate for the Stata List as they don't pertain directly to Stata. But since it relates to a question posed and seems to be of general interest, I thought I would send it. Any thoughts or responses would be appreciated. thanks. david. P.S. Some Statalisters are able to "connect" their responses to a specific question when they send them in with the result that they appear continguously as part of a consistent theme (questions, responses, more responses, additonal questions, etc.). Does anybody know how to do this. All I know how to do is to addres it to the statalist address ([email protected]) and copy the subject line from the original question. Unfortunately, its gets placed chronologically and not thematically. I looked for information on how to do this on the Stata web site, but was not able to find any. David Miller 703-305-5352 (voice) 703 605-1289 (fax) OPP/Health Effects Division * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: linear vs log-linear regression: specification test***From:*"Nick Cox" <[email protected]>

**st: Re: linear vs log-linear regression: specification test***From:*"Scott Merryman" <[email protected]>

**Re: st: linear vs log-linear regression: specification test***From:*Richard Williams <[email protected]>

**Re: st: linear vs log-linear regression: specification test***From:*Richard Williams <[email protected]>

- Prev by Date:
**st: R-SQUARED AND XTGEE** - Next by Date:
**Re: st: linear vs log-linear regression: specification test** - Previous by thread:
**st: R-SQUARED AND XTGEE** - Next by thread:
**Re: st: linear vs log-linear regression: specification test** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |