Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Linear regression of 'log' predictors


From   "Christer Thrane" <[email protected]>
To   <[email protected]>
Subject   Re: st: Linear regression of 'log' predictors
Date   Wed, 27 Sep 2006 20:50:05 +0200

Hi,

One might add to Richard's comment that non-normal residuals only is a problem in small samples and that the use of -regress, robust- solves this issue in small samples as well.

Christer

************************************************
Professor Christer Thrane
Department of Social Science
Lillehammer University College
2626 Lillehammer, Norway
+ 47 61 28 81 70 (fax)
+ 47 61 28 82 47 (phone, work)
+ 47 61 25 53 04 (phone, home)
E-mail, work: [email protected]
E-mail, home: [email protected]
************************************************

----- Original Message ----- From: "Richard Goldstein" <[email protected]>
To: <[email protected]>
Sent: Wednesday, September 27, 2006 3:43 PM
Subject: Re: st: Linear regression of 'log' predictors



There is no assumption regarding the distribution of the
actual data in linear regression; however, for the p-values
and confidence intervals to be meaningful, the residuals
of the regression must be normally distributed

if you have estimated a model assuming linearity
of continuous terms and no interactions (i.e.,
assuming addivity), then the distribution of the
residuals often mimics the distribution of the
left-hand-side variable (here, los) -- however, if
you include non-linear terms (e.g., polynomials or
splines) or if you include any interactions, then
the distribution of the residuals can be quite
different from the distribution of the left-hand-side
variable

if you decide to transform the left-hand-side with logs,
there are user-written procedures to help with the
interpretation -- use -search-

another alternative is to not transform, and use -glm-
and use a log link

hope this helps,

Rich

Ashwin Ananthakrishnan wrote:
Hi,

I have a model where the outcome is length of stay
(los). This variable has some right skew and is not
perfectly 'normal'. Is it valid for me to run linear regression of other
predictors on length of stay if the los is not
normally distributed?

If it is not valid, then log (los) is a normally
distributed variable. But how do I interpret the
coefficients of the log(los). I find that
exponentiating log(los) coefficient doesn't seem to be
appropriate as it doesn't yield valid results. For
example p>0.05, but the 95% CI don't overlap 'zero'
which is what I would expect in linear regression.
Also exp(log(los)) doesn't give a similar estimate as
the coefficients if I run the regression on los
directly.

I apologize in advance if my question is either to
basic or difficult to understand.
Thank you.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index