Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping over variables in more than one group

From	jaweria seth <[email protected]>
To	[email protected]
Subject	Re: st: Looping over variables in more than one group
Date	Wed, 7 Mar 2012 12:08:36 -0600

Thanks guys,

I'm not sure where to go from here. I've tried many methods with this
regression model.
Here's what I am trying to accomplish:
I have over 80 variables available to me, and as they are finanacial
metrics, most are highly and significantly correlated with one
another.
I am trying to build a multi-variate linear regression model that best
predicts my Y variable (profit/ returns/ etc..)...
Thinking about this intuitively, the majority of the variables should
work, but my issue is: how do I choose which variables to include?


Any help would be appreciated,
Thanks,
j.seth

On Wed, Mar 7, 2012 at 11:02 AM, Joerg Luedicke
<[email protected]> wrote:
> This is data mining and if you are interested in hypothesis testing
> your p-values will be of no use. To give just a simple example, based
> on your question: imagine you have 5 variables and each is supposed to
> measure the same thing, say x. Now you run 5 regressions for each of
> those variables and find that only one of them is "significant". Would
> you then conclude that x has a "significant" effect on y?
>
> Remember that the goal of statistical modeling is to provide useful
> information that cannot be obtained otherwise (or only with much
> higher costs). The goal is not to find p-values below some arbitrary
> threshold.
>
> J.
>
> On Wed, Mar 7, 2012 at 8:31 AM, jaweria seth <[email protected]> wrote:
>> Thanks J,
>> You are correct. In theory, i expect a 'production/size' variable to
>> significantly affect my dependent variable, however, I wanted to let
>> the regression spit out which of the variables in that category are
>> most significant (since they are somewhat similar). In that case, I am
>> looking to the tstatistics of the independent variables in the model.
>> Is that not correct?
>>
>>
>>
>> On Wed, Mar 7, 2012 at 10:00 AM, Joerg Luedicke
>> <[email protected]> wrote:
>>> You should probably rather think about what covariates make the most
>>> sense to include with respect to your theory and research question.
>>> Digging up variables to cook up good looking p-values and then
>>> interpreting these p-values in the usual way is a questionable
>>> endeavor, to say the least. However, if you are rather interested in
>>> something like a prediction model, and not in hypothesis testing, you
>>> could just use straight data mining techniques right away, for example
>>> boosted regression (-findit boost-).
>>>
>>> J.
>>>
>>> On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <[email protected]> wrote:
>>>> Thanks Nick,
>>>> I understand this would result in a large number of models..
>>>> however, I wouldn't be combining variables of the same category/group,
>>>> as this would bring up the issue of multicollinearity.
>>>> for example, I know for sure I need to add one variable each from
>>>> groups 1 and 2. group 1 contains variables that measure the
>>>> size/production of a business, and I am wondering which of those
>>>> variables would be most significant in a multi-variate model. I am
>>>> looking at t-stats in the regression output: if even one of the
>>>> variables included is not significant at the 10%, that model gets
>>>> dropped..( and as im running the regressions manually, i find that the
>>>> majority of the combos are not significant).
>>>>
>>>> Does this make sense? If so, how can I implement it?
>>>> The way I am doing it right now: Holding one variable from group2
>>>> constant and looping through group 1/size variables to find
>>>> significance. however, this gets tricky when I try to include a third
>>>> variable.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <[email protected]> wrote:
>>>>> Before you even think of how to implement this, do the combinatorics
>>>>> of how many models this implies.
>>>>>
>>>>> So, for example,
>>>>>
>>>>> . di 30^4
>>>>> 810000
>>>>>
>>>>> . di 5^4
>>>>> 625
>>>>>
>>>>> Then bump up those numbers adding in the null choices, i.e. no
>>>>> variable from each group, as well.
>>>>>
>>>>> So you would need not only to do the looping but to ponder what it
>>>>> implies in terms of gathering results from thousands of models,
>>>>> finding the "best", whatever that means, including the implications
>>>>> for how you think about the resulting P-values, etc.
>>>>>
>>>>> Nick
>>>>>
>>>>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <[email protected]> wrote:
>>>>>
>>>>>> I would like to run regressions with up to 4 different variables. My
>>>>>> variables are separated into 4 groups with 5-30 variables in each
>>>>>> group. I would like to run regression combos of different variables to
>>>>>> find the best model:
>>>>>> How do I regress my y variable on 1 variable from group 1 and 1 from
>>>>>> group 2 and loop through different combos of each?
>>>>>> for ex:
>>>>>> regress Yvariable Group1 Group2
>>>>>>
>>>>>> Then I would like to add a variable from group 3, and so on..
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> Jaweria Seth
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Looping over variables in more than one group
  - From: William Pratt <[email protected]>

References:
- st: Looping over variables in more than one group
  - From: jaweria seth <[email protected]>
- Re: st: Looping over variables in more than one group
  - From: Nick Cox <[email protected]>
- Re: st: Looping over variables in more than one group
  - From: jaweria seth <[email protected]>
- Re: st: Looping over variables in more than one group
  - From: Joerg Luedicke <[email protected]>
- Re: st: Looping over variables in more than one group
  - From: jaweria seth <[email protected]>
- Re: st: Looping over variables in more than one group
  - From: Joerg Luedicke <[email protected]>

Prev by Date: st: RE: Figure out in which do-file an error occurred
Next by Date: st: CI interval in ANOVA summary table
Previous by thread: Re: st: Looping over variables in more than one group
Next by thread: Re: st: Looping over variables in more than one group
Index(es):
- Date
- Thread