# st: Re: log of variables in xtreg

 From Trond.Petersen@hsphsun2.harvard.edu To JA@hsphsun2.harvard.edu Subject st: Re: log of variables in xtreg Date Tue October 15, 2002

```I received your email from a colleague in Stockholm, Peter Hedstrom.
I cc
the rest of the Stata list so that people can see that the query has
been

I also send you, not the rest of the list, a paper I wrote on these
issues,
as a PDF file, based on notes made while teaching regression
analysis.  There is also a note or an addendum to this paper written
by Leo
Goodman.  It gives the exact conditions for when a coefficient changes
sign
as one goes from unlogged to logged values, in the case of comparing
two

What you describe happens with some regularity.  I have run across it
in
teaching graduate classes in regression.  The reason is this.

Consider a simple example with a continuous dependent varable y_i and
dichotomous independent D_i, for individual i.  The point made carries
over
to multivariate regression situations and to continuous independent
variables.

1.  In unlogged form you estimate

y_i = b_0 + b_1*D_i + error

Here, b_1 gives the impact on the conditional mean of y_i, giving the
mean
difference in y between those with D_i=0 and D_i=1.

2.  In logged form you estimate

ln(y_i) = a_0 + a_1*D_i + error

Here, a_1 gives the impact on the conditional mean of ln(y_i), giving
the
mean difference in the logarithm of y between those with D_i=0 and
D_i=1.
There are now two correct interpretations of a_1.

The first correct interpretation is the one given above, that it gives
the
impact on the conditional mean of ln(y_i), giving the mean difference
in
the logarithms of y between those with D_i=0 and D_i=1.

The second correct interpretation is that it gives the relative impact
on
the conditional GEOMETRIC mean of the unlogged values, that is, the
relative difference in geometric means between those with D_i=0 and
D_i=1.  To get this, one often computes exp[a_1] - 1.

When researchers interpret a_1 as giving the relative impact on the
unlogged y, then that is a misinterpretation.

3.  Reason 1 for sign change

The interpretational difference identified in 2 is probably the source
of
the sign change you observe.  A variable may have a positive impact on
the
conditional arithmetic mean of a variable but a negative impact on the
conditional geometric mean of the same variable.

4.  Reason 2 for sign change

A second reason for a sign change is that in going from unlogged to
logged
form, or vice versa, to make the two formulations consistent, you
would
need to include interaction terms.  Take the impact of age on wages
controlling for sex, with a positive coefficient for age.  In the
unlogged
form, the age lines for the sexes are parallel.  In the logged form,
the
difference, when transformed back to unlogged values, between the
sexes
increases with age.  Some kind of interaction term would then be
needed in
order to make the lines parallel for the retransformed variable in the
logged form.

5.  Solutions

There are two solutions when the sign change occurred for the first
reason.

Solution 1:  Estimate an exponential regression

y_i = exp(a_0 + a_1*D_i) + error_i

Solution 2:  Estimate a GLM

y_i = exp(a_0 + a_1*D_i)*error_i

where error_i in GLM is gamma distributed.

In STATA you can do both.

In the exponential regression you would need to include dummy
variables for
years and countries to get fixed effects.  You could develop a random
effects estimator, but it would take programming.  You need to program
the
exponential regression.

In the GLM you would also to have to include dummy variables for years
and
countries to get fixed effects.  But STATA has already implemented a
canned
random effects estimator, as I remember it.

Trond Petersen
Professor
Department of Sociology
UC Berkeley

----- Original Message -----
From: "Javier Aparicio" <fjaparicio@altavista.net>
To: <statalist@hsphsun2.harvard.edu>
Sent: Tuesday, October 15, 2002 8:06 AM
Subject: st: log of variables in xtreg

> Dear list,
>
> I am running xtreg with time and country effects in a panel with 50
years and 45 countries.  My depvar and two covariates are in constant
>
> I am testing some policy indicator variables, and some coefficients
switch signs when I take the log of my dollar-valued variables.  What
worries me is that the estimates are significant either in the log
transform, or without it--but with opposite signs.
>
> Any suggestions of what should I do?
>
> Thanks,
>
> -JA

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```