 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: Problem with proportions as explanatory variables in panel data regression

 From DE SOUZA Eric To "'statalist@hsphsun2.harvard.edu'" Subject RE: st: Problem with proportions as explanatory variables in panel data regression Date Tue, 14 Dec 2010 13:42:20 +0100

```Replying to Maarten's reply because I deleted the original by inadvertence.
Consider:
y =   b_0 + b_1x_1 + b_2x_2 + b_3x_3
The x's are constrained by:
1 =   x_1 + x_2 + x_3
or in changes:
0 =   dx_1 + dx_2 + dx_3
Substituting dx_1 by - dx_2 - dx_3:

dy =   b_1dx_1 + b_2dx_2 + b_3dx_3
=   b_1( - dx_2 - dx_3) + b_2dx_2 + b_3dx_3
=   (b_2 - b_1)dx_2 + (b_3 - b_1)dx_3
This means that if you regress y on x_2 and x_3, dropping x_1 to avoid perfect collinearity, the coefficient on x_2 reflects the effect of a change in x_2 on y given that x_1 has to adapt also to respect the constraint. x_3 is kept constant.

Eric de Souza
College of Europe
Dyver 11
Belgium

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Maarten buis
Sent: 14 December 2010 11:05
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Problem with proportions as explanatory variables in panel data regression

--- On Tue, 14/12/10, F. Javier Sese wrote:
> I am modeling the dependent variable (Y) as a function of three main
> explanatory variables (X1-X3) and a vector of control variables (Z).
>
> X1-X3 are proportions: they range between zero and one and add up to
> one for each observation (X1 + X2 + X3 = 1). Given the nature of
> X1-X3, there is a high negative correlation between them (an increase
> in one variable leads to a decrease in the other two), which gives
> rise to a potential collinearity problem that may be causing some
> unexpected results in the signs and statistical significance of the
> coefficients. In my dataset, X1 and X2 have a correlation coefficient
> of -0.81; X1 and X3 of -0.42; X2 and X3 of -0.19.
>
> Given that the main focus of my research is on understanding the
> impact of these three variables on Y, I would really appreciate it if
> someone can provide me with some guidance on how to obtain reliable
> parameter estimates for the coefficients b1-b3.

Multicolinearity is in it self never a problem: it leads to a reduction in the power of our tests, but that is just an accurate representation of the amount of information available in the data.

The real problem with your data is conceptual. We usually interpret coefficients as a change in y for a unit change in x while keeping all else constant. How can you change one proportion while keeping the others constant? You can't. You can find a discussion of this problem and possible solutions in chapter 12 of J. Aitchison (2003 ) "The Statistical Analysis of Compositional Data". Caldwell, NJ: The Blackburn Press.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```