Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Proportional Independent Variables

From   Joerg Luedicke <>
Subject   Re: st: Proportional Independent Variables
Date   Thu, 28 Feb 2013 00:39:48 -0500

When unsure about things like these, it is always a good idea to run a
bunch of simulations with fabricated data. Below is some code for
checking consistency of OLS estimates, based on the described set up.
First, we generate 5 variables containing uniform random variates on
the range [0,1), and constrain the variables such that they sum up to
one for each observation. Then, we set up a program to feed to Stata's
-simulate-, and finally inspect the results. You can change sample
size, number of variables, and parameter values in order to closer
resemble your problem at hand.

The amount of bias looks indeed negligible to me, confirming Nick Cox'
impressions. Efficiency might be a different story though...


// Generate data
set obs 500
set seed 1234

forval i=1/5 {
	gen u`i' = runiform()

egen su = rowtotal(u*)
gen wu = 1/su

forval i=1/5 {
	gen cnsx`i' = u`i'*wu

keep cnsx*

// Set up program for -simulate-
program define mysim, rclass

	cap drop e y
	gen e = rnormal()
	gen y = 0.1*cnsx1 + 0.2*cnsx2 + ///
			0.3*cnsx3 + 0.4*cnsx4 + e
	reg y cnsx1 cnsx2 cnsx3 cnsx4
	forval i = 1/4 {
		local b`i' = _b[cnsx`i']
		return scalar b`i' = `b`i''


// Run simulations
simulate b1=r(b1) b2=r(b2) b3=r(b3) b4=r(b4), ///
reps(10000) seed(4321) : mysim

// Results

On Wed, Feb 27, 2013 at 3:40 PM, nick bungy
<> wrote:
> Dear Statalist,
> I have a dependent variable that is continuous
> and a set of 20 independent variables that are percentage based, with
> the condition that the sum of these variables must be 100% across each
> observation. The data is across section only.
> I am aware that
> interpretting the coefficients from a general OLS fit will be
> inaccurate. The increase of one of the 20 variables will have to be
> facilitated by a decrease in one or more of the other 19 variables.
> Is
>  there an approach to get consistent coefficient estimates of these
> parameters that consider the influence of a proportionate decrease in
> one or more of the other 20 variables?
> Best,
> Nick
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index