You should probably rather think about what covariates make the most
sense to include with respect to your theory and research question.
Digging up variables to cook up good looking p-values and then
interpreting these p-values in the usual way is a questionable
endeavor, to say the least. However, if you are rather interested in
something like a prediction model, and not in hypothesis testing, you
could just use straight data mining techniques right away, for example
boosted regression (-findit boost-).
J.
On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <jaweriaseth@gmail.com> wrote:
> Thanks Nick,
> I understand this would result in a large number of models..
> however, I wouldn't be combining variables of the same category/group,
> as this would bring up the issue of multicollinearity.
> for example, I know for sure I need to add one variable each from
> groups 1 and 2. group 1 contains variables that measure the
> size/production of a business, and I am wondering which of those
> variables would be most significant in a multi-variate model. I am
> looking at t-stats in the regression output: if even one of the
> variables included is not significant at the 10%, that model gets
> dropped..( and as im running the regressions manually, i find that the
> majority of the combos are not significant).
>
> Does this make sense? If so, how can I implement it?
> The way I am doing it right now: Holding one variable from group2
> constant and looping through group 1/size variables to find
> significance. however, this gets tricky when I try to include a third
> variable.
>
>
> Thanks,
>
> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> Before you even think of how to implement this, do the combinatorics
>> of how many models this implies.
>>
>> So, for example,
>>
>> . di 30^4
>> 810000
>>
>> . di 5^4
>> 625
>>
>> Then bump up those numbers adding in the null choices, i.e. no
>> variable from each group, as well.
>>
>> So you would need not only to do the looping but to ponder what it
>> implies in terms of gathering results from thousands of models,
>> finding the "best", whatever that means, including the implications
>> for how you think about the resulting P-values, etc.
>>
>> Nick
>>
>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <jaweriaseth@gmail.com> wrote:
>>
>>> I would like to run regressions with up to 4 different variables. My
>>> variables are separated into 4 groups with 5-30 variables in each
>>> group. I would like to run regression combos of different variables to
>>> find the best model:
>>> How do I regress my y variable on 1 variable from group 1 and 1 from
>>> group 2 and loop through different combos of each?
>>> for ex:
>>> regress Yvariable Group1 Group2
>>>
>>> Then I would like to add a variable from group 3, and so on..
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/