Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: extracting rownames of a matrix into a variable (preparatory work for MICE)


From   "Gresch,Cornelia" <gresch@mpib-berlin.mpg.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: extracting rownames of a matrix into a variable (preparatory work for MICE)
Date   Fri, 19 Sep 2008 20:37:16 +0200

This is just great, thanks!
(and sorry for having bothered without checking the archive)
Cornelia

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Friday, September 19, 2008 5:49 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: extracting rownames of a matrix into a variable (preparatory work for MICE)

This is asked every few months. See for example 

http://www.stata.com/statalist/archive/2008-05/msg01224.html

and the ensuing thread pointing to -svmat2- as a canned solution. 

Nick 
n.j.cox@durham.ac.uk 

Gresch,Cornelia

I want to extract a matrix into data-format in a way that the row-names are not lost but appear in a new variable as "observations" (contents of the first variable in string format). Does anybody of you have an idea how to get this running?

Background of the question: I work on an imputation model with MICE (command ice). Since the dataset is very large I need to decide which variables to use as independent variables. Therefore I want to create a matrix including the R² (or LL) of bivariate regression models for all possible combination of the variables. The col-names of this matrix correspond to the dependent variables, the row-names of the matrix to the independent variables of the bivariate regression model.

Because the final matrix will be very large I would like to convert it into variables (which can be easily used to define about 15-25 independent variables which again I want to use as predictors in the final imputation model).  Using the command "svmat" results in the expected dataset but the row-names (which are necessary to identify the appropriate predictors) get lost.

Below you can see the code and the resulting problem in more detail:
(N.B.: variables are "e3his" "e220ex" "t_kfts" (which are metric); "tr10emp" "e121a" (which are categorial with corresponding dummies (tr10emp_1, tr10emp_2,...)); and "slspaet" "tr06sex" (which are binary))


matrix drop _all
foreach av of var e3his e220ex t_kfts {
  foreach katuv of any tr10emp e121a {
      local `katuv' = "`katuv'_*"
  }
      matrix `av' = (1)
 foreach uv of any e3his e220ex t_kfts /* metrische UVs */ ///
      "`tr10emp'" "`e121a'"  /* polytome UVs */ ///
      slspaet tr06sex /* Dummy-UVs */ {
      quietly reg `av' `uv'
      local r2 = "Fehler"
      local r2 = e(r2)
      matrix input R =(`r2')
      matrix colnames `av' = `av'
      matrix rownames R = `uv' 
      matrix `av' = (`av' \R) 
  }   
      if ("`av'" == "e3his") matrix ges_ols = (`av') 
      if ("`av'" ~= "e3his") matrix ges_ols = (ges_ols, `av')
}
matrix list ges_ols 

ges_ols[8,3]
               e3his     e220ex     t_kfts
       r1          1          1          1
    e3his          1  .04320558  .05278807
   e220ex  .04320558          1  .01621444
   t_kfts  .05278807  .01621444          1
tr10emp_*  .16007611  .02387149  .16865035
       r1          0          0          0
  slspaet  .01107509  .00863178  4.931e-06
  tr06sex  1.640e-06  .00104972  .00829559


If I convert this matrix to variables I get the following dataset:

svmat ges_ols, names(col)
 list 
     +--------------------------------+
     |    e3his     e220ex     t_kfts |
     |--------------------------------|
  1. |        1          1          1 |
  2. |        1   .0432056   .0527881 |
  3. | .0432056          1   .0162144 |
  4. | .0527881   .0162144          1 |
  5. | .1600761   .0238715   .1686504 |
     |--------------------------------|
  6. |        0          0          0 |
  7. | .0110751   .0086318   4.93e-06 |
  8. | 1.64e-06   .0010497   .0082956 |
     +--------------------------------+


But I also need a variable with the names of the variables in string format as I have it in the matrix (at least that's the only way I see to identify the 15-25 best fitting predictors). Does anybody of you have an idea how to get this running? Alternatively I tried to put the names as strings in the matrix, which also does not work and I also cannot transform the matrix (rows to columns and the other way around) and extract it that way since some of the independent variables are dummysets  (e.g. tr10emp_*) and therefore cannot be identified as variable name.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index