Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: creating loops using combinations of variables


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: creating loops using combinations of variables
Date   Thu, 16 Feb 2012 12:33:17 +0000

Thanks for your clarification. 

My impression is that the combinatorial explosion here makes this impracticable for 27 variables and in fact a bad idea in principle. If you are trying out what are literally millions of different models, your inferences have to be adjusted for that selection; otherwise no P-values can be taken very seriously. How to program it is a side-issue, except that trying to nest loops would be time-consuming and stressful. 

A quite separate issue is that ln(1 + anything) looks like a fudge for the fact that mostly you want to work with logarithms but that you know zeros are possible. Using ln(1 + anything) divides statistical people right down the middle, as various threads on this list have shown. The pessimistic view is that if you do this, you throw away most of what is useful and interpretable about ln(anything). 

Nick 
n.j.cox@durham.ac.uk 

Zeynep Ozkok

Thank you very much for your comment Nick.

Let me try to clarify the issue a bit by taking three variables as you
suggested. The three variables are: var1, var2, and var3.

What I would like to do is the following:

Step 1: Generate two variables called lex1, and lex2 such that, lex1 =
var1 and lex2= var2+var3
Generate two indices index1 and index2, such that: index1 = ln(1+
lex1) and index2 = ln(1+lex2)

Run a regression of the following form: Y_i,s,t= alpha_i +alpha_s
+alpha_t +beta*(index1)_i,t +lamda* (index2)_i,t + error_i,s,t

Save the coefficients for index1 and index2, and the Rsquare.

Clear lex1, lex2, index1, index2.

Step 2: Generate two variables called lex1, and lex2 such that, lex1 =
var2 and lex2= var1+var3
Generate two indices index1 and index2, such that: index1 = ln(1+
lex1) and index2 = ln(1+lex2)

Run a regression of the following form: Y_i,s,t= alpha_i +alpha_s
+alpha_t +beta*(index1)_i,t +lamda* (index2)_i,t + error_i,s,t

Save the coefficients for index1 and index2, and the Rsquare.

Clear lex1, lex2, index1, index2.

Step 3: Generate two variables called lex1, and lex2 such that, lex1 =
var3 and lex2= var1+var2
Generate two indices index1 and index2, such that: index1 = ln(1+
lex1) and index2 = ln(1+lex2)

Run a regression of the following form: Y_i,s,t= alpha_i +alpha_s
+alpha_t +beta*(index1)_i,t +lamda* (index2)_i,t + error_i,s,t

Save the coefficients for index1 and index2, and the Rsquare.

Clear lex1, lex2, index1, index2.

Unfortunately the order of the variables included in the index
measures are important. I should be able to tell which significant
indices include which variables. To me that seems almost impossible
when considering 27 variables. Is there a way to construct a loop to
run this entire process?

Thank you so much for all your help.

Zeynep


> On Thu, Feb 16, 2012 at 11:41 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>> "All possible combinations" would usually mean, for 27 variables, 27 ways of selecting just one, comb(27, 2) = 351 ways of selecting two, ..., up to comb(27, 27) = 1 way of selecting them all. In total that means 2^27 - 1 ~ 10^8 combinations. That is, precisely, 134,217,727 combinations.
>>
>> My suggestion is to set aside the fact that you have 27 variables. Show us exactly what you would do with just 3 variables, say.
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Zeynep Ozkok
>>
>> I have a question on how to create loops for combinations of different
>> variables. I have 27 variables that I would like to put in two different
>> indices.
>>
>> The indices can be constructed in two steps:
>>
>> Lex1=sum(of different variables out of 27)   this variable should be able
>> to take on 1 to 27 variables, so it should allow for all possible
>> combinations. It could be equal to only 1 variable, or it could be equal to
>> the sum of different variables
>>
>> Index1 = ln (1+lex1)  this index is then dependent on what values lex1
>> takes on
>>
>> Similarly
>>
>> Lex2 = sum (of all the variables that are not accounted in lex1) again this
>> could take on one variable, or more than one depending on the structure of
>> lex1.
>>
>> Index2 = ln(1+lex2) this index is once again dependent on what values lex2
>> takes on, which is dependent on the values that lex1 takes on.
>>
>> Then these two indices will simultaneously be used in fixed effects
>> regressions as follows:
>>
>> Y_i,s,t= alpha_i +alpha_s +alpha_t +beta*(index1)_i,t +lamda* (index2)_i,t
>> + error_i,s,t
>>
>> The loop must go on until all possibilities/ combinations are completed. I
>> need to check the results of the beta and lamda coefficients and their
>> corresponding rsquares for each regression. Since there are numerous
>> possibilities in constructing each index I need to create a loop. However I
>> don't even know how to start out a loop that depends on combinations of
>> variables. Could you possibly help me out in writing and solving this
>> problem?
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index