Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Proportional Independent Variables

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Proportional Independent Variables Date Thu, 28 Feb 2013 11:33:53 +0000

```Even if a proportion x on [0,1] has a strong effect on y, that is
unlikely in principle to be linear over the whole range of x. This is,
as it were, the logic of logit regression for continuous proportions
turned around with response and predictor exchanged.

This may not matter much if the range of x is small and/or  the
"effect" of x is small too, but then if the interpretation is more
secure the effect is likely to be unimportant.

What this boils down to, I think, is that it is risky to try to
interpret coefficients too literally when the predictors are
proportions. A sensible model won't and can't be a hyperplane.

The size of the coefficients just depends on your units of measurement.

I am sensing that you are not using variables other than the
proportions as predictors. If that's wrong, then the problem is more
complicated, but what's been said presumably remains pertinent.

Nick

On Thu, Feb 28, 2013 at 11:11 AM, nick bungy
<nickbungystata@hotmail.co.uk> wrote:
> Hi Joerg,
> Thanks for the reply. Could I ask for a little more of your time for clarification?
> If I consider that y is suitably large that my coefficients are 5000, 6000, 7000 & 8000 respectively. If I slightly adapt and run your example, I find the absolute level of bias between this new set of coefficients and your set are identical. In which case, is this deviation in the coefficients from the true value not just a result of the error you factored in when generating y?
> In regards to the coefficients generated from a simple OLS, how would I interpret them in the context of there having to be a proportionate response? Is the coefficient still only going to be one half of the story (i.e. it captures the effect of var1 but doesn't capture the proportionate response in any/or of the other vars 2-20)?
> The restriction that the omitted variable has an effect of zero seems quite strongly prohibitive. If I want to eventually build a view whereby I can ascertain 'if xi increases and xj is the proportionate response, the change in y is ... ' for all i not equal to j, then eventually this assumption will be violated. Is there a way around this?

>> From: joerg.luedicke@gmail.com

>> I should have added that this is assuming that the omitted variable
>> has an effect of zero. If the effect of the omitted variable is
>> non-zero, then the estimates for the other variables are biased by an
>> amount equal to the effect size of the omitted predictor. For example,
>> if the effect for cnsx1 was 0.1 and the fifth variable (cnsx5) had an
>> effect of 0.1 as well, then the estimate for cnsx1 would be zero when
>> fitting the model without cnsx5 (in expectation).

>> On Thu, Feb 28, 2013 at 12:39 AM, Joerg Luedicke

>> > When unsure about things like these, it is always a good idea to run a
>> > bunch of simulations with fabricated data. Below is some code for
>> > checking consistency of OLS estimates, based on the described set up.
>> > First, we generate 5 variables containing uniform random variates on
>> > the range [0,1), and constrain the variables such that they sum up to
>> > one for each observation. Then, we set up a program to feed to Stata's
>> > -simulate-, and finally inspect the results. You can change sample
>> > size, number of variables, and parameter values in order to closer
>> > resemble your problem at hand.
>> >
>> > The amount of bias looks indeed negligible to me, confirming Nick Cox'
>> > impressions. Efficiency might be a different story though...
>> >
>> > Joerg
>> >
>> > *--------------------------------------------
>> > // Generate data
>> > clear
>> > set obs 500
>> > set seed 1234
>> >
>> > forval i=1/5 {
>> > gen u`i' = runiform()
>> > }
>> >
>> > egen su = rowtotal(u*)
>> > gen wu = 1/su
>> >
>> > forval i=1/5 {
>> > gen cnsx`i' = u`i'*wu
>> > }
>> >
>> > keep cnsx*
>> >
>> > // Set up program for -simulate-
>> > program define mysim, rclass
>> >
>> > cap drop e y
>> > gen e = rnormal()
>> > gen y = 0.1*cnsx1 + 0.2*cnsx2 + ///
>> > 0.3*cnsx3 + 0.4*cnsx4 + e
>> > reg y cnsx1 cnsx2 cnsx3 cnsx4
>> >
>> > forval i = 1/4 {
>> > local b`i' = _b[cnsx`i']
>> > return scalar b`i' = `b`i''
>> > }
>> >
>> > end
>> >
>> > // Run simulations
>> > simulate b1=r(b1) b2=r(b2) b3=r(b3) b4=r(b4), ///
>> > reps(10000) seed(4321) : mysim
>> >
>> > // Results
>> > sum
>> > *--------------------------------------------
>> >
>> >
>> > On Wed, Feb 27, 2013 at 3:40 PM, nick bungy
>> > <nickbungystata@hotmail.co.uk> wrote:
>> >> Dear Statalist,
>> >>
>> >> I have a dependent variable that is continuous
>> >> and a set of 20 independent variables that are percentage based, with
>> >> the condition that the sum of these variables must be 100% across each
>> >> observation. The data is across section only.
>> >>
>> >> I am aware that
>> >> interpretting the coefficients from a general OLS fit will be
>> >> inaccurate. The increase of one of the 20 variables will have to be
>> >> facilitated by a decrease in one or more of the other 19 variables.
>> >>
>> >> Is
>> >> there an approach to get consistent coefficient estimates of these
>> >> parameters that consider the influence of a proportionate decrease in
>> >> one or more of the other 20 variables?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```