Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Joerg Luedicke <joerg.luedicke@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Looping over variables in more than one group |

Date |
Wed, 7 Mar 2012 09:02:15 -0800 |

This is data mining and if you are interested in hypothesis testing your p-values will be of no use. To give just a simple example, based on your question: imagine you have 5 variables and each is supposed to measure the same thing, say x. Now you run 5 regressions for each of those variables and find that only one of them is "significant". Would you then conclude that x has a "significant" effect on y? Remember that the goal of statistical modeling is to provide useful information that cannot be obtained otherwise (or only with much higher costs). The goal is not to find p-values below some arbitrary threshold. J. On Wed, Mar 7, 2012 at 8:31 AM, jaweria seth <jaweriaseth@gmail.com> wrote: > Thanks J, > You are correct. In theory, i expect a 'production/size' variable to > significantly affect my dependent variable, however, I wanted to let > the regression spit out which of the variables in that category are > most significant (since they are somewhat similar). In that case, I am > looking to the tstatistics of the independent variables in the model. > Is that not correct? > > > > On Wed, Mar 7, 2012 at 10:00 AM, Joerg Luedicke > <joerg.luedicke@gmail.com> wrote: >> You should probably rather think about what covariates make the most >> sense to include with respect to your theory and research question. >> Digging up variables to cook up good looking p-values and then >> interpreting these p-values in the usual way is a questionable >> endeavor, to say the least. However, if you are rather interested in >> something like a prediction model, and not in hypothesis testing, you >> could just use straight data mining techniques right away, for example >> boosted regression (-findit boost-). >> >> J. >> >> On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <jaweriaseth@gmail.com> wrote: >>> Thanks Nick, >>> I understand this would result in a large number of models.. >>> however, I wouldn't be combining variables of the same category/group, >>> as this would bring up the issue of multicollinearity. >>> for example, I know for sure I need to add one variable each from >>> groups 1 and 2. group 1 contains variables that measure the >>> size/production of a business, and I am wondering which of those >>> variables would be most significant in a multi-variate model. I am >>> looking at t-stats in the regression output: if even one of the >>> variables included is not significant at the 10%, that model gets >>> dropped..( and as im running the regressions manually, i find that the >>> majority of the combos are not significant). >>> >>> Does this make sense? If so, how can I implement it? >>> The way I am doing it right now: Holding one variable from group2 >>> constant and looping through group 1/size variables to find >>> significance. however, this gets tricky when I try to include a third >>> variable. >>> >>> >>> Thanks, >>> >>> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Before you even think of how to implement this, do the combinatorics >>>> of how many models this implies. >>>> >>>> So, for example, >>>> >>>> . di 30^4 >>>> 810000 >>>> >>>> . di 5^4 >>>> 625 >>>> >>>> Then bump up those numbers adding in the null choices, i.e. no >>>> variable from each group, as well. >>>> >>>> So you would need not only to do the looping but to ponder what it >>>> implies in terms of gathering results from thousands of models, >>>> finding the "best", whatever that means, including the implications >>>> for how you think about the resulting P-values, etc. >>>> >>>> Nick >>>> >>>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <jaweriaseth@gmail.com> wrote: >>>> >>>>> I would like to run regressions with up to 4 different variables. My >>>>> variables are separated into 4 groups with 5-30 variables in each >>>>> group. I would like to run regression combos of different variables to >>>>> find the best model: >>>>> How do I regress my y variable on 1 variable from group 1 and 1 from >>>>> group 2 and loop through different combos of each? >>>>> for ex: >>>>> regress Yvariable Group1 Group2 >>>>> >>>>> Then I would like to add a variable from group 3, and so on.. >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > > -- > Jaweria Seth > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Looping over variables in more than one group***From:*jaweria seth <jaweriaseth@gmail.com>

**References**:**st: Looping over variables in more than one group***From:*jaweria seth <jaweriaseth@gmail.com>

**Re: st: Looping over variables in more than one group***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Looping over variables in more than one group***From:*jaweria seth <jaweriaseth@gmail.com>

**Re: st: Looping over variables in more than one group***From:*Joerg Luedicke <joerg.luedicke@gmail.com>

**Re: st: Looping over variables in more than one group***From:*jaweria seth <jaweriaseth@gmail.com>

- Prev by Date:
**st: RE: Re: about MLE of exponential distribution** - Next by Date:
**RE: st: Copying Stata code with line numbers** - Previous by thread:
**Re: st: Looping over variables in more than one group** - Next by thread:
**Re: st: Looping over variables in more than one group** - Index(es):