Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: getting realistic fitted values from a regression |

Date |
Fri, 23 Jul 2010 12:28:31 -0400 |

Steve On Jul 23, 2010, at 11:53 AM, Austin Nichols wrote: Nick, Kit, et al.-- The other fixes can work really badly in the presence of non-lognormal errors and/or heteroskedasticity, but -glm- or -poisson- still works well, as pointed out in: http://www.stata.com/meeting/boston10/boston10_nichols.pdf In fact, I think the claim in the -levpredict- package is too strong: "These predictions avoid the retransformation bias that arises when predictions of the log dependent variable are exponentiated. See Cameron and Trivedi, MUS, 2009, 3.6.3." Note that even MUS claims only "a weaker assumption is to assume that u_i is i.i.d., in which case we can consistently estimate E[exp(u)] by the sample average of exp(\hat{u}); see Duan(1983)" which is quite distinct from avoiding retransformation bias in a non-iid setting, and furthermore makes no claim about minimizing root mean square prediction error, or RMSE of marginal effects, which presumably is the goal of Woolton Lee. Consistent estimation of the exponentiated error gets your mean prediction closer to the mean of the outcome in levels, but still not as close as -poisson- or -glm-, and does not guarantee that predictions in levels for individual cases are particularly good. On Fri, Jul 23, 2010 at 11:06 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:

Thanks for the commendation. It is easy enough to try the -glm- approach _and_ other fixes and to compare results. I have found that they give very similar answers in practice. What allcan agree on is that some kind of fix is needed when your realinterestis predicting on the original scale and a log scale -- or indeed any other nonlinear transform or link -- was used for the response in modelling. Nick n.j.cox@durham.ac.uk David Jacobs Maarten states the received wisdom on this issue, but in the econometrics text authored by Jeffrey Wooldridge (Introductory Econometrics Thompson-Southwestern 2003 ) on pp. 208-9 Wooldridge suggests a way to obtain unlogged predictions from a regression in which the regressand is in log form (there have been subsequent editions of this book but the page numbers I give will be close in those newer editions). If one of the statistical experts on this list is familiar with this approach or is willing to look it up, I'd be interested in their reaction. That said, I wholeheartedly agree with Maarten's recommendation. I found the article he suggests by Cox et al. to be extremely useful and I'm grateful to him for suggesting it on another occasion. David Jacobs At 03:08 AM 7/22/2010, you wrote:--- On Wed, 21/7/10, Woolton Lee wrote:I have estimated a regression (OLS) using log of patient travel distance to a hospital predicted by patient, hospital and area characteristics. I am going to report the results as marginal effects that I've computed by obtaining predictions from my estimated regression computed by fixing some variables and keeping others at their original values. However after I compute the predictions, I am getting unrealistically large numbers. When I examined the regression residuals it looks as though the obs with unrealistic fitted values have larger residuals. Is there a way to adjust the regression to better account for this problem?If you want to predict the travel distance you should use -glm- with -link(log)- option rather than use -regress- on a log transformed dependent variable. The difference is that with the former you are modeling log(E(y)), while in the latter you are moddeling E(log(y)). If you want to backtransform your predictions using the antlog transformation you will get exp(log(E(y))) = E(y) for the -glm- command, while after -regress you get exp(E(log(y))) != E(y). A nice discussion on this issue can be found in:Nicholas J. Cox, Jeff Warburton, Alona Armstrong, Victoria J.Holliday(2007) "Fitting concentration and load rating curves with generalized linear models" Earth Surface Processes and Landforms, 33(1):25--39. <http://www3.interscience.wiley.com/journal/114281617/abstract> There exist approximations you can use after -regress- to fix this problem, by why try to fix a problem if you can easily prevent it?

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: getting realistic fitted values from a regression***From:*Woolton Lee <finished07@gmail.com>

**Re: st: getting realistic fitted values from a regression***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**Re: st: getting realistic fitted values from a regression***From:*David Jacobs <jacobs.184@sociology.osu.edu>

**RE: st: getting realistic fitted values from a regression***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: getting realistic fitted values from a regression***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: RE: Data management** - Next by Date:
**RE: st: RE: Data management** - Previous by thread:
**Re: st: getting realistic fitted values from a regression** - Next by thread:
**Re: st: getting realistic fitted values from a regression** - Index(es):