Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping over variables in more than one group

From   jaweria seth <[email protected]>
To   [email protected]
Subject   Re: st: Looping over variables in more than one group
Date   Wed, 7 Mar 2012 10:31:10 -0600

Thanks J,
You are correct. In theory, i expect a 'production/size' variable to
significantly affect my dependent variable, however, I wanted to let
the regression spit out which of the variables in that category are
most significant (since they are somewhat similar). In that case, I am
looking to the tstatistics of the independent variables in the model.
Is that not correct?

On Wed, Mar 7, 2012 at 10:00 AM, Joerg Luedicke
<[email protected]> wrote:
> You should probably rather think about what covariates make the most
> sense to include with respect to your theory and research question.
> Digging up variables to cook up good looking p-values and then
> interpreting these p-values in the usual way is a questionable
> endeavor, to say the least. However, if you are rather interested in
> something like a prediction model, and not in hypothesis testing, you
> could just use straight data mining techniques right away, for example
> boosted regression (-findit boost-).
> J.
> On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <[email protected]> wrote:
>> Thanks Nick,
>> I understand this would result in a large number of models..
>> however, I wouldn't be combining variables of the same category/group,
>> as this would bring up the issue of multicollinearity.
>> for example, I know for sure I need to add one variable each from
>> groups 1 and 2. group 1 contains variables that measure the
>> size/production of a business, and I am wondering which of those
>> variables would be most significant in a multi-variate model. I am
>> looking at t-stats in the regression output: if even one of the
>> variables included is not significant at the 10%, that model gets
>> dropped..( and as im running the regressions manually, i find that the
>> majority of the combos are not significant).
>> Does this make sense? If so, how can I implement it?
>> The way I am doing it right now: Holding one variable from group2
>> constant and looping through group 1/size variables to find
>> significance. however, this gets tricky when I try to include a third
>> variable.
>> Thanks,
>> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <[email protected]> wrote:
>>> Before you even think of how to implement this, do the combinatorics
>>> of how many models this implies.
>>> So, for example,
>>> . di 30^4
>>> 810000
>>> . di 5^4
>>> 625
>>> Then bump up those numbers adding in the null choices, i.e. no
>>> variable from each group, as well.
>>> So you would need not only to do the looping but to ponder what it
>>> implies in terms of gathering results from thousands of models,
>>> finding the "best", whatever that means, including the implications
>>> for how you think about the resulting P-values, etc.
>>> Nick
>>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <[email protected]> wrote:
>>>> I would like to run regressions with up to 4 different variables. My
>>>> variables are separated into 4 groups with 5-30 variables in each
>>>> group. I would like to run regression combos of different variables to
>>>> find the best model:
>>>> How do I regress my y variable on 1 variable from group 1 and 1 from
>>>> group 2 and loop through different combos of each?
>>>> for ex:
>>>> regress Yvariable Group1 Group2
>>>> Then I would like to add a variable from group 3, and so on..
>>> *
>>> *   For searches and help try:
>>> *
>>> *
>>> *
>> *
>> *   For searches and help try:
>> *
>> *
>> *
> *
> *   For searches and help try:
> *
> *
> *

Jaweria Seth

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index