[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Maarten Buis" <[email protected]> |

To |
<[email protected]> |

Subject |
st: RE: using glm for the retransformation problem - basic questions |

Date |
Wed, 29 Aug 2007 09:52:39 +0200 |

--- Krista Jacobs wrote: > I am estimating a model where y is vitamin A consumption, and among > the x's is participation in a nutritional intervention. The vitamin A > consumption variable was highly skewed, so I ran > svyreg lny x. (The distribution of lny is sufficiently close to normal.) It is a common misconception that the distribution of the dependent variable needs to be normal. This is not the case: The assumption is that the distribution of the dependent variable conditional on the explanatory variables is normal, in other words the residuals need to be normal. The unconditional distribution of the dependent variable can look highly non-normal even if the residuals are normal. > Unfortunately, the results were a little too high for me to really > believe, so I also ran svyreg y x which yielded something a bit more > reasonable. It was suggested that I might be seeing the > retransformation problem at work. These two models model something slightly different: if you -etransform- a linear regression with a log-transformed dependent variable you get for dummy variables the difference in geometric means (Newson 2003), while if you look at a dummy variable with a non-transformed dependent variable you get the difference in arithmatic means. > The homoskedasticity of the error terms from svyreg lny x is > rejected. Tests of assumptions are pretty useless when it comes to model building: They may tell you there is a problem, but they do not tell you what the problem is or how to solve it. They also mess with your inference, now the p-values are all conditional on the prior tests, which is probably not what you want. What you want to do is look at various graphs involving the residuals. They will give you a lot more information about the heteroscedasticity comes from and what to do about it. For a clear overview on this topic see: (Fox 1991) > I started to work with glm and a log link, but I have a few basic > (sorry) questions. > > First, in the glm estimation should y be the dependent variable or > lny? That is, do I want to write " glm y x, link(log)" or "glm lny x, > link(log)." I think it's the first, but I'm not positive. -glm y x, link(log)- > Second, I've been using the default Gaussian for the family. Is there > a reason to use a different distribution like gamma or Poisson? poisson is a discrete distribution, so you may not want to use that (or use the -robust- option). > Third, for simplicity, say x is a dummy variable. After I run "glm y > x, link(log)" I ask Stata to exponentiate with eform. Are the results > it gives after eform > > Exp(xB) where x=1 > ----------------------------- > Exp(xB) where x=0 > > evaluated at the mean? If not, what are they? -eform- gives you exp(b) (as it says on the top the coefficient table in the output). Hope this helps, Maarten Fox, John (1991), "Regression Diagnostics", Thousand Oaks: Sage. Newson, R. (2003), "Stata tip 1: The eform() option of regress". The Stata Journal, 3(4): 445. ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room Z434 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:

- Prev by Date:
**st: countfit and inflate(_cons)** - Next by Date:
**Re: st: RE: using glm for the retransformation problem - basic questions** - Previous by thread:
**st: countfit and inflate(_cons)** - Next by thread:
**Re: st: RE: using glm for the retransformation problem - basic questions** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |