Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping over variables in more than one group


From   William Pratt <williamrpratt@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Looping over variables in more than one group
Date   Wed, 7 Mar 2012 12:33:01 -0600

Refer to stepwise, it will allow you to set the significance levels
for variable inclusion. Though you really should consider the prior
comments using this ad hoc approach.

On Wed, Mar 7, 2012 at 12:08 PM, jaweria seth <jaweriaseth@gmail.com> wrote:
> Thanks guys,
>
> I'm not sure where to go from here. I've tried many methods with this
> regression model.
> Here's what I am trying to accomplish:
> I have over 80 variables available to me, and as they are finanacial
> metrics, most are highly and significantly correlated with one
> another.
> I am trying to build a multi-variate linear regression model that best
> predicts my Y variable (profit/ returns/ etc..)...
> Thinking about this intuitively, the majority of the variables should
> work, but my issue is: how do I choose which variables to include?
>
>
> Any help would be appreciated,
> Thanks,
> j.seth
>
> On Wed, Mar 7, 2012 at 11:02 AM, Joerg Luedicke
> <joerg.luedicke@gmail.com> wrote:
>> This is data mining and if you are interested in hypothesis testing
>> your p-values will be of no use. To give just a simple example, based
>> on your question: imagine you have 5 variables and each is supposed to
>> measure the same thing, say x. Now you run 5 regressions for each of
>> those variables and find that only one of them is "significant". Would
>> you then conclude that x has a "significant" effect on y?
>>
>> Remember that the goal of statistical modeling is to provide useful
>> information that cannot be obtained otherwise (or only with much
>> higher costs). The goal is not to find p-values below some arbitrary
>> threshold.
>>
>> J.
>>
>> On Wed, Mar 7, 2012 at 8:31 AM, jaweria seth <jaweriaseth@gmail.com> wrote:
>>> Thanks J,
>>> You are correct. In theory, i expect a 'production/size' variable to
>>> significantly affect my dependent variable, however, I wanted to let
>>> the regression spit out which of the variables in that category are
>>> most significant (since they are somewhat similar). In that case, I am
>>> looking to the tstatistics of the independent variables in the model.
>>> Is that not correct?
>>>
>>>
>>>
>>> On Wed, Mar 7, 2012 at 10:00 AM, Joerg Luedicke
>>> <joerg.luedicke@gmail.com> wrote:
>>>> You should probably rather think about what covariates make the most
>>>> sense to include with respect to your theory and research question.
>>>> Digging up variables to cook up good looking p-values and then
>>>> interpreting these p-values in the usual way is a questionable
>>>> endeavor, to say the least. However, if you are rather interested in
>>>> something like a prediction model, and not in hypothesis testing, you
>>>> could just use straight data mining techniques right away, for example
>>>> boosted regression (-findit boost-).
>>>>
>>>> J.
>>>>
>>>> On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <jaweriaseth@gmail.com> wrote:
>>>>> Thanks Nick,
>>>>> I understand this would result in a large number of models..
>>>>> however, I wouldn't be combining variables of the same category/group,
>>>>> as this would bring up the issue of multicollinearity.
>>>>> for example, I know for sure I need to add one variable each from
>>>>> groups 1 and 2. group 1 contains variables that measure the
>>>>> size/production of a business, and I am wondering which of those
>>>>> variables would be most significant in a multi-variate model. I am
>>>>> looking at t-stats in the regression output: if even one of the
>>>>> variables included is not significant at the 10%, that model gets
>>>>> dropped..( and as im running the regressions manually, i find that the
>>>>> majority of the combos are not significant).
>>>>>
>>>>> Does this make sense? If so, how can I implement it?
>>>>> The way I am doing it right now: Holding one variable from group2
>>>>> constant and looping through group 1/size variables to find
>>>>> significance. however, this gets tricky when I try to include a third
>>>>> variable.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>> Before you even think of how to implement this, do the combinatorics
>>>>>> of how many models this implies.
>>>>>>
>>>>>> So, for example,
>>>>>>
>>>>>> . di 30^4
>>>>>> 810000
>>>>>>
>>>>>> . di 5^4
>>>>>> 625
>>>>>>
>>>>>> Then bump up those numbers adding in the null choices, i.e. no
>>>>>> variable from each group, as well.
>>>>>>
>>>>>> So you would need not only to do the looping but to ponder what it
>>>>>> implies in terms of gathering results from thousands of models,
>>>>>> finding the "best", whatever that means, including the implications
>>>>>> for how you think about the resulting P-values, etc.
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <jaweriaseth@gmail.com> wrote:
>>>>>>
>>>>>>> I would like to run regressions with up to 4 different variables. My
>>>>>>> variables are separated into 4 groups with 5-30 variables in each
>>>>>>> group. I would like to run regression combos of different variables to
>>>>>>> find the best model:
>>>>>>> How do I regress my y variable on 1 variable from group 1 and 1 from
>>>>>>> group 2 and loop through different combos of each?
>>>>>>> for ex:
>>>>>>> regress Yvariable Group1 Group2
>>>>>>>
>>>>>>> Then I would like to add a variable from group 3, and so on..
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/statalist/faq
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> --
>>> Jaweria Seth
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
wrpratt@utpa.edu
University of Texas-Pan American
College of Business Administration
Department of Finance and Economics
South Texas Border Health Disparities Center
o - 956.665.7937
f - 956.665.7310

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index