Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping over variables in more than one group


From   Joerg Luedicke <joerg.luedicke@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Looping over variables in more than one group
Date   Wed, 7 Mar 2012 08:00:52 -0800

You should probably rather think about what covariates make the most
sense to include with respect to your theory and research question.
Digging up variables to cook up good looking p-values and then
interpreting these p-values in the usual way is a questionable
endeavor, to say the least. However, if you are rather interested in
something like a prediction model, and not in hypothesis testing, you
could just use straight data mining techniques right away, for example
boosted regression (-findit boost-).

J.

On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <jaweriaseth@gmail.com> wrote:
> Thanks Nick,
> I understand this would result in a large number of models..
> however, I wouldn't be combining variables of the same category/group,
> as this would bring up the issue of multicollinearity.
> for example, I know for sure I need to add one variable each from
> groups 1 and 2. group 1 contains variables that measure the
> size/production of a business, and I am wondering which of those
> variables would be most significant in a multi-variate model. I am
> looking at t-stats in the regression output: if even one of the
> variables included is not significant at the 10%, that model gets
> dropped..( and as im running the regressions manually, i find that the
> majority of the combos are not significant).
>
> Does this make sense? If so, how can I implement it?
> The way I am doing it right now: Holding one variable from group2
> constant and looping through group 1/size variables to find
> significance. however, this gets tricky when I try to include a third
> variable.
>
>
> Thanks,
>
> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> Before you even think of how to implement this, do the combinatorics
>> of how many models this implies.
>>
>> So, for example,
>>
>> . di 30^4
>> 810000
>>
>> . di 5^4
>> 625
>>
>> Then bump up those numbers adding in the null choices, i.e. no
>> variable from each group, as well.
>>
>> So you would need not only to do the looping but to ponder what it
>> implies in terms of gathering results from thousands of models,
>> finding the "best", whatever that means, including the implications
>> for how you think about the resulting P-values, etc.
>>
>> Nick
>>
>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <jaweriaseth@gmail.com> wrote:
>>
>>> I would like to run regressions with up to 4 different variables. My
>>> variables are separated into 4 groups with 5-30 variables in each
>>> group. I would like to run regression combos of different variables to
>>> find the best model:
>>> How do I regress my y variable on 1 variable from group 1 and 1 from
>>> group 2 and loop through different combos of each?
>>> for ex:
>>> regress Yvariable Group1 Group2
>>>
>>> Then I would like to add a variable from group 3, and so on..
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index