# R: st: Interpretation of regressionmodel of ln-transformed variable

 From "Carlo Lazzaro" To Subject R: st: Interpretation of regressionmodel of ln-transformed variable Date Wed, 5 Nov 2008 11:28:57 +0100

```Dear Roland,
in addition to Maarten's wise insight, I was wondering whether, in order to
dealing with the skewed sampling distribution of LOS,  bootstrapping your
raw data without log-transforming may be a good way to go (please, see for
instance Glick HA, Doshi JA, Sonnad SS Polsky D. Economic Evaluation of
Clinical Trials. Oxford: Oxford University Press, 2007: 89-113).

I tried to replicate an abridged version of your problem (no drugs and/or
interaction among variables allowed) first performing a multiple linear
regression on the raw data and then replicating OLS on bootstrapped data
(10,000 random samples for each variables). Both attempts did not reach
statistical significance.

. regress LOS age surgery

Source |       SS       df       MS              Number of obs =
10
-------------+------------------------------           F(  2,     7) =
2.39
Model |  2.79848368     2  1.39924184           Prob > F      =
0.1619
Residual |  4.10151632     7  .585930904           R-squared     =
0.4056
-------------+------------------------------           Adj R-squared =
0.2357
Total |         6.9     9  .766666667           Root MSE      =
.76546

----------------------------------------------------------------------------
--
LOS |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
age |  -.0126865   .0177748    -0.71   0.498    -.0547173
.0293443
surgery |   1.035422   .4866573     2.13   0.071    -.1153401
2.186183
_cons |   4.697851   .5397554     8.70   0.000     3.421532
5.974169
----------------------------------------------------------------------------
--
reg  boot_los  boot_surgery boot_age

Source |       SS       df       MS              Number of obs =
10000
-------------+------------------------------           F(  2,  9997) =
0.51
Model |  .070269293     2  .035134647           Prob > F      =
0.5984
Residual |  683.931319  9997  .068413656           R-squared     =
0.0001
-------------+------------------------------           Adj R-squared =
-0.0001
Total |  684.001589  9999     .068407           Root MSE      =
.26156

----------------------------------------------------------------------------
--
boot_los |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
boot_surgery |  -.0053317   .0165804    -0.32   0.748    -.0378326
.0271692
boot_age |  -.0005713    .000594    -0.96   0.336    -.0017356
.0005931
_cons |   4.914887   .0171246   287.01   0.000     4.881319
4.948455
----------------------------------------------------------------------------
--

HTH and Kind Regards,
Carlo

-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Maarten buis
Inviato: mercoledì 5 novembre 2008 10.49
A: statalist@hsphsun2.harvard.edu
Oggetto: Re: st: Interpretation of regressionmodel of ln-transformed
variable

> It is also difficult to imaging that there should be censoring
> for conditions that normally need 1 to 7 days of hospital visit.

Ok, sounds reasonable.

> Following your example I have made this model
>
> xi:regress lnLOS  lapscopic i.appdgn age agesq cons, eform("exp(b)")
> nocons
>
> and get this result
>
> lnLOS       	exp(b)	    [95% Conf.  Interval]
> lapscopic  	1.018056    1.004532	1.031762
> _Iappdgn2_1	1.850726    1.824841	1.876978
> _Iappdgn2_3	1.174283    1.147247	1.201956
> age           .9852508    .9841405	.9863623
> agesq	        1.000275    1.000261	1.000289
> cons	        2.208685    2.168225	2.2499
>
> I now understand that the exp(b) is a multiplicator, ie that open
> appendectomy has a geometric mean LOS of 2.21 days whereas
> laparoscopic patients have 1.02*2.21=2.25 days or 0.04 days longer
> geometric mean LOS. Is it correct to recalculate the CI of this
> difference as 2.21-1.0045*2.21=0.01 and 2.21-1.032*2.21=0.07?

In that case I would use -adjust- and -nlcom- like in the example
below:

*--------------- begin example --------------------------
sysuse cancer, clear
gen ln_t = ln(studytime)
gen cons = 1
xi: reg ln_t i.drug age cons, nocons eform("exp(b)")

adjust _Idrug_3=0 age, by(_Idrug_2) exp ci
sum age if e(sample)
nlcom exp((_b[cons] + _b[age]*`r(mean)')+ _b[_Idrug_2]) -  ///
exp((_b[cons] + _b[age]*`r(mean)'))
*---------------- end example ---------------------------

Notice that the difference in LOS now depends on the values of the
other explanatory variables. These other variables define the baseline
LOS (in your case the LOS for someone who received an open
appendectomy). So if you haven't mean centered age, then the difference
in geometric mean LOS you reported applies to newly born babies. You
can report the difference in geometric mean LOS for someone of average
age either by first mean centering age (subtract the mean age from the
variable age as I did in the example in my previous post), or take mean
age into account like in the example above.

Hope this helps,
Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```