Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Joerg Luedicke <joerg.luedicke@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Proportional Independent Variables |

Date |
Thu, 28 Feb 2013 10:03:09 -0500 |

See below: On Thu, Feb 28, 2013 at 6:11 AM, nick bungy <nickbungystata@hotmail.co.uk> wrote: > Hi Joerg, > Thanks for the reply. Could I ask for a little more of your time for clarification? > If I consider that y is suitably large that my coefficients are 5000, 6000, 7000 & 8000 respectively. If I slightly adapt and run your example, I find the absolute level of bias between this new set of coefficients and your set are identical. In which case, is this deviation in the coefficients from the true value not just a result of the error you factored in when generating y? You need to interpret effect sizes (and for that matter, the deviation of the expected values of the coefficients from true values) in _relative_ terms. In my example, I generated the Gaussian error with a standard deviation of 1, and so the effect sizes that I chose arbitrarily were all rather small, given that a unit increase in x covers its entire range (e.g. an effect of 0.4 would be interpreted as a "change" in the outcome of 0.4 standard deviations when x goes from 0 to 1 etc.; this may not be a useful interpretation with real data as a real x would often not even cover this entire range, also see Nick's comments about linearity). When you say you used coefficients of 5000,...,8000, did you change the error variance? For example, if you would like to plug in values from a previous real data fit, you should use the RMSE of that fit as the standard deviation of the Gaussian error, at least if you were also interested in the variation of the estimates. > In regards to the coefficients generated from a simple OLS, how would I interpret them in the context of there having to be a proportionate response? Is the coefficient still only going to be one half of the story (i.e. it captures the effect of var1 but doesn't capture the proportionate response in any/or of the other vars 2-20)? If you have a set of variables that sum up to some constant and use only some of these variables but not all of them, and the omitted variables have a non-zero effect, then you are facing some apparent omitted variable bias because all variables in the set are necessarily correlated due to their nature of summing up to a constant. You could play around with simulations using a variety of assumptions and data generating models in order to see how that plays out and how relevant it might be with respect to your problem. > The restriction that the omitted variable has an effect of zero seems quite strongly prohibitive. If I want to eventually build a view whereby I can ascertain 'if xi increases and xj is the proportionate response, the change in y is ... ' for all i not equal to j, then eventually this assumption will be violated. Is there a way around this? I don't know. I don't really have any experience with compositional data and it is probably best to consult the relevant literature. Also, Nick's suggestions seem to be very useful in this regard. My point here merely was that, if unsure about whether certain modeling assumptions make sense, fabricated data simulations can be very illuminating. Joerg > Many thanks, > Nick >> Date: Thu, 28 Feb 2013 01:43:00 -0500 >> Subject: Re: st: Proportional Independent Variables >> From: joerg.luedicke@gmail.com >> To: statalist@hsphsun2.harvard.edu >> >> I should have added that this is assuming that the omitted variable >> has an effect of zero. If the effect of the omitted variable is >> non-zero, then the estimates for the other variables are biased by an >> amount equal to the effect size of the omitted predictor. For example, >> if the effect for cnsx1 was 0.1 and the fifth variable (cnsx5) had an >> effect of 0.1 as well, then the estimate for cnsx1 would be zero when >> fitting the model without cnsx5 (in expectation). >> >> Joerg >> >> >> On Thu, Feb 28, 2013 at 12:39 AM, Joerg Luedicke >> <joerg.luedicke@gmail.com> wrote: >> > When unsure about things like these, it is always a good idea to run a >> > bunch of simulations with fabricated data. Below is some code for >> > checking consistency of OLS estimates, based on the described set up. >> > First, we generate 5 variables containing uniform random variates on >> > the range [0,1), and constrain the variables such that they sum up to >> > one for each observation. Then, we set up a program to feed to Stata's >> > -simulate-, and finally inspect the results. You can change sample >> > size, number of variables, and parameter values in order to closer >> > resemble your problem at hand. >> > >> > The amount of bias looks indeed negligible to me, confirming Nick Cox' >> > impressions. Efficiency might be a different story though... >> > >> > Joerg >> > >> > *-------------------------------------------- >> > // Generate data >> > clear >> > set obs 500 >> > set seed 1234 >> > >> > forval i=1/5 { >> > gen u`i' = runiform() >> > } >> > >> > egen su = rowtotal(u*) >> > gen wu = 1/su >> > >> > forval i=1/5 { >> > gen cnsx`i' = u`i'*wu >> > } >> > >> > keep cnsx* >> > >> > // Set up program for -simulate- >> > program define mysim, rclass >> > >> > cap drop e y >> > gen e = rnormal() >> > gen y = 0.1*cnsx1 + 0.2*cnsx2 + /// >> > 0.3*cnsx3 + 0.4*cnsx4 + e >> > reg y cnsx1 cnsx2 cnsx3 cnsx4 >> > >> > forval i = 1/4 { >> > local b`i' = _b[cnsx`i'] >> > return scalar b`i' = `b`i'' >> > } >> > >> > end >> > >> > // Run simulations >> > simulate b1=r(b1) b2=r(b2) b3=r(b3) b4=r(b4), /// >> > reps(10000) seed(4321) : mysim >> > >> > // Results >> > sum >> > *-------------------------------------------- >> > >> > >> > On Wed, Feb 27, 2013 at 3:40 PM, nick bungy >> > <nickbungystata@hotmail.co.uk> wrote: >> >> Dear Statalist, >> >> >> >> I have a dependent variable that is continuous >> >> and a set of 20 independent variables that are percentage based, with >> >> the condition that the sum of these variables must be 100% across each >> >> observation. The data is across section only. >> >> >> >> I am aware that >> >> interpretting the coefficients from a general OLS fit will be >> >> inaccurate. The increase of one of the 20 variables will have to be >> >> facilitated by a decrease in one or more of the other 19 variables. >> >> >> >> Is >> >> there an approach to get consistent coefficient estimates of these >> >> parameters that consider the influence of a proportionate decrease in >> >> one or more of the other 20 variables? >> >> >> >> Best, >> >> >> >> Nick >> >> >> >> * >> >> * For searches and help try: >> >> * http://www.stata.com/help.cgi?search >> >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> >> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Proportional Independent Variables***From:*nick bungy <nickbungystata@hotmail.co.uk>

**Re: st: Proportional Independent Variables***From:*Joerg Luedicke <joerg.luedicke@gmail.com>

**Re: st: Proportional Independent Variables***From:*Joerg Luedicke <joerg.luedicke@gmail.com>

**RE: st: Proportional Independent Variables***From:*nick bungy <nickbungystata@hotmail.co.uk>

- Prev by Date:
**RE: st: Comparing regression coefficients** - Next by Date:
**Re: st: Fw: Unable to combine graphs using -grc1leg-** - Previous by thread:
**Re: st: Proportional Independent Variables** - Next by thread:
**st: counting number of stores per brand at a certain distance** - Index(es):