Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Proportional Independent Variables

From   Nick Cox <>
Subject   Re: st: Proportional Independent Variables
Date   Thu, 28 Feb 2013 09:19:07 +0000

Joerg's suggestions are naturally all good.

I might add some general points.

1. With such predictors I'd see no obligation to keep them on the
original measured scale. In many problems components with small mean
proportions can be diagnostic of something important.

2. For different reasons log and logit transformations might be
considered. There is a very inward-looking literature on compositional
data analysis centred on more exotic transformations tailored to the
problem. The reference I gave earlier is one entry into that.

3. The two previous points are often complicated by measured zeros.
There is then a long slow agony about whether they are structural or
sampling zeros and what to do about them. The more components are
measured, the worse this usually gets, whether it is a fractions of a
budget spent on different things, or proportions of a material by
elements or compounds or particle size classes, or whatever.


On Thu, Feb 28, 2013 at 8:22 AM, Nick Cox <> wrote:
> I should have said use 19 at most.
> Nick
> On Wed, Feb 27, 2013 at 10:12 PM, Nick Cox <> wrote:
>> In principle, yes. In practice, the effect might be slight. You could
>> look at e.g.
>> for ideas on transformations that tackle this issue. My guess is that
>> you will lose more on interpretability than you will gain. But use 19
>> not 20.
>> Nick
>> On Wed, Feb 27, 2013 at 8:40 PM, nick bungy
>> <> wrote:
>>> Dear Statalist,
>>> I have a dependent variable that is continuous
>>> and a set of 20 independent variables that are percentage based, with
>>> the condition that the sum of these variables must be 100% across each
>>> observation. The data is across section only.
>>> I am aware that
>>> interpretting the coefficients from a general OLS fit will be
>>> inaccurate. The increase of one of the 20 variables will have to be
>>> facilitated by a decrease in one or more of the other 19 variables.
>>> Is
>>>  there an approach to get consistent coefficient estimates of these
>>> parameters that consider the influence of a proportionate decrease in
>>> one or more of the other 20 variables?
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index