Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Linear regression of 'log' predictors

From   "Maarten Buis" <[email protected]>
To   <[email protected]>
Subject   st: RE: Linear regression of 'log' predictors
Date   Wed, 27 Sep 2006 15:56:58 +0200

First, given the name of your dependent variable "length of stay" I presume that it measures some duration (duration till leaving). In that case I would strongly recommend using Stata's survival time models, either -stcox- or -streg-. If you don't have any censoring (people who haven't yet left when you stopped collecting data) than using log(los) as dependent variable is equivalent to using -streg-, with the distribution(lnormal) option. However it is very unlikely that you have no censoring, in which case -streg- is by far preferable. I have written a short introduction to survival analysis, which you can get from . It also contains some links to other sites which information on survival analysis. 

As for interpretation, say you have one explanatory variable called female (0 = male, 1 = female) and you find a regression coefficient of -4, than the average duration is 4% less for females than for males.


Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z434 

+31 20 5986715

-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of Ashwin Ananthakrishnan
Sent: woensdag 27 september 2006 15:32
To: [email protected]
Subject: st: Linear regression of 'log' predictors

I have a model where the outcome is length of stay
(los). This variable has some right skew and is not
perfectly 'normal'.

Is it valid for me to run linear regression of other
predictors on length of stay if the los is not
normally distributed?

If it is not valid, then log (los) is a normally
distributed variable. But how do I interpret the
coefficients of the log(los). I find that
exponentiating log(los) coefficient doesn't seem to be
appropriate as it doesn't yield valid results. For
example p>0.05, but the 95% CI don't overlap 'zero'
which is what I would expect in linear regression.
Also exp(log(los)) doesn't give a similar estimate as
the coefficients if I run the regression on los

I apologize in advance if my question is either to
basic or difficult to understand.
Thank you.

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index