Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Programming a slightly complex list of independent variables


From   "Nic" <[email protected]>
To   <[email protected]>
Subject   st: Programming a slightly complex list of independent variables
Date   Wed, 6 Apr 2011 00:51:55 -0400

Hi statalist

I am attempting to create a .do file which will run a number of OLS regressions containing a single continuous x continuous interaction term.

My ultimate question is: how can I program my regression command so that the s* and f* variables at the end of the command refer to all "s" and "f" variables EXCEPT for the two specific "s" and "f" (`x' and `z') variables referenced at the beginning of the equation?

Here is the applicable code of what I have so far:

-------------------------------------------------------------------------
foreach y of varlist d* {
local laby : variable label `y'
   foreach x of varlist s*  {
   local labx : variable label `x'
   local prex = substr("`x'",1,3)
       foreach z of varlist f* {
       local labz : variable label `z'
       local prez = substr("`z'",1,2)

           regress `y' `x' `z' i`prex'`prez' g* c* e* s* f*
--------------------------------------------------------------------------

As you can see, the inclusion of s* and f* at the end of the equation will result in two variables being repeated in the equation: `x' and `z'. The consequence is that one instance of the repeated variables is omitted because of collinearity.

I would assume that the second instance (s* or f*) of the repeated variable in the equation would be the one that is omitted, but this is not always so. Sometimes it is the first instance (`x' or `z'). Apparently this is normal ("Which variable it omits is somewhat arbitrary") according to the Stata FAQ, "Why do estimation commands sometimes omit variables?" located at www.stata.com/support/faqs/stat/drop.html.

The consequence of the above is that the location of the values in the e(b) and e(V) matrices is unpredictable. This is a problem for me because the next step in my .do file is to call upon the first and second independent variables listed in the regression command as well as their interaction term (to ultimately create a graph):

----------------------
matrix b=e(b)
matrix V=e(V)

scalar b1=b[1,1]
scalar b2=b[1,2]
scalar b3=b[1,3]


scalar varb1=V[1,1]
scalar varb2=V[2,2]
scalar varb3=V[3,3]

scalar covb1b3=V[1,3]
scalar covb2b3=V[2,3]
-----------------------

As you can see, when the second instance of the repeated variables is omitted, b1/b2/b3 etc refer to a valid cell in the matrix. But when the first instance is "somewhat arbitrarily" omitted instead, b1/b2/b3 etc no longer refer to the intended cells in the matrix.

So my ultimate question is: how can I program my regression command so that the s* and f* variables at the end of the command refer to all "s" and "f" variables EXCEPT for the two specific "s" and "f" (`x' and `z') variables referenced at the beginning of the equation? Logic tells me that this is surely possible but I am still so new to Stata and programming in particular that I simply have not been able to suss it out.

With gratitude,
Nic
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index