Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

[no subject]



The output of -correlate- given a varlist of two or more variables
is a matrix of correlations for every pair of variables in
varlist. How could we produce an equivalent directly for
-spearman-? We need to find out that -spearman- leaves a
correlation behind in r(rho):

. makematrix, from(r(rho)) : spearman head trunk length
displacement weight

                  headroom         trunk        length  displacement
weight
    headroom             1
       trunk     .67678924             1
      length     .53235996     .71907323             1
displacement     .47845891     .57664675     .85248218             1
      weight     .52808385     .65644851     .94895697     .90538822
1

The result is displayed using -matrix list- and we will normally
want to tidy up the presentation, say by

. makematrix, from(r(rho)) format(%4.3f) : spearman head trunk
length displacement weight

                  headroom         trunk        length  displacement
weight
    headroom         1.000
       trunk         0.677         1.000
      length         0.532         0.719         1.000
displacement         0.478         0.577         0.852         1.000
      weight         0.528         0.656         0.949         0.905
1.000

However, let us leave these details of presentation on one side.
In this case, given a bivariate command, and a varlist, and a
single result from which to compile the matrix, -makematrix- takes
each pair of variables from varlist, runs a bivariate command for
that pair, and puts a single result in the cell defined by each
pair of variables. So both rows and columns are specified by
varlist.

Alternatively, we might want different sets of variables on the
rows and the columns, perhaps specifying a submatrix of the full
matrix.  The option -cols()- can be used to specify variables to
appear as columns.  Say we did a principal component analysis of
five variables and followed with calculation of scores:

. pca head trunk length displacement weight
. score score1-score5
. makematrix, from(r(rho)) cols(score?) : correlate head trunk
length displacement weight

                   score1      score2      score3      score4
score5
    headroom   .69579216   .65541006   .28995191  -.04724258
.00263525
       trunk   .84053038    .3144061  -.42608327   .11382425
.01243294
      length
 .94323831  -.20350815  -.05828833  -.22445161  -.12292224
displacement   .89424409  -.29085394   .19339097
 .27602318  -.04628849
      weight   .93915804  -.28562389    .0409204  -.10426623
.15445146

Here the full correlation matrix of variables and scores, as would
be produced by -correlate-, is 10 X 10, and the submatrix produced
by -makematrix- is only 5 X 5. The default number of decimal
places is clearly ridiculous, and we would normally want to work
on the column headers.  The matrix result can be left in memory as
a named matrix, and then further manipulated:

. makematrix R, from(r(rho)) cols(score?) : correlate head trunk
length displacement weight

. matrix colnames R = "score 1" "score 2" "score 3" "score 4"
"score 5"

. matrix li R, format(%4.3f)

R[5,5]
              score 1  score 2  score 3  score 4  score 5
    headroom    0.696    0.655    0.290   -0.047    0.003
       trunk    0.841    0.314   -0.426    0.114    0.012
      length    0.943   -0.204   -0.058   -0.224   -0.123
displacement    0.894   -0.291    0.193    0.276   -0.046
      weight    0.939   -0.286    0.041   -0.104    0.154

Another application of the -cols()- option is perhaps more
commonly desired:

. makematrix , from(r(rho) r(p)) label cols(price) : spearman
mpg-foreign

                             rho           p
       Mileage (mpg)  -.55546596   7.272e-07
  Repair Record 1978   .10275187   .40082135
       Headroom (in)    .1174198   .33661622
 Trunk space (cu ft)   .42395912   .00028325
        Weight (lbs)   .50135653   .00001143
         Length (in)   .50145304   .00001138
    Turn Circle (ft)   .32117803   .00712682
Displacement (cu in)   .41612747   .00037625
          Gear Ratio   -.3053873   .01072089
            Car type   .08065421   .51002468

. makematrix , from(r(rho) r(p)) list label format(%4.3f %6.5f) sep(0)
cols(price) : spearman mpg-foreign

  +-------------------------------------------+
  |                             rho         p |
  |-------------------------------------------|
  | Mileage (mpg)            -0.555   0.00000 |
  | Repair Record 1978        0.103   0.40082 |
  | Headroom (in.)            0.117   0.33662 |
  | Trunk space (cu. ft.)     0.424   0.00028 |
  | Weight (lbs.)             0.501   0.00001 |
  | Length (in.)              0.501   0.00001 |
  | Turn Circle (ft.)         0.321   0.00713 |
  | Displacement (cu. in.)    0.416   0.00038 |
  | Gear Ratio               -0.305   0.01072 |
  | Car type                  0.081   0.51002 |
  +-------------------------------------------+

As this example shows, we can also ask for the results to be shown
using the -list- command, which opens a wider range of
presentation possibilities. The -label- option asks for variable
labels to be shown, and the numeric variables can be assigned
display formats.

As this example also shows, we can show two or more scalar results
from each command run.  This is possible in various ways. A
univariate command can be repeated, each time yielding two or more
scalars:

. makematrix, from(r(mean) r(sd) r(skewness)) : su head trunk
length displacement weight, detail

                    mean          sd    skewness
    headroom   2.9932432   .84599477   .14086508
       trunk   13.756757   4.2774042   .02920342
      length   187.93243    22.26634  -.04097455
displacement    197.2973   91.837219   .59165653
      weight   3019.4595   777.19357   .14811637

. makematrix, from(r(mean) r(sd) r(skewness)) list format(%2.1f
%2.1f %4.3f) sep(0) : su head trunk length displacement weight,
detail

  +------------------------------------------+
  |                  mean      sd   skewness |
  |------------------------------------------|
  | headroom          3.0     0.8      0.141 |
  | trunk            13.8     4.3      0.029 |
  | length          187.9    22.3     -0.041 |
  | displacement    197.3    91.8      0.592 |
  | weight         3019.5   777.2      0.148 |
  +------------------------------------------+

-makematrix- reasons in this way: The user wants three scalars,
which I will show in three columns. So I must run the command
specified in turn on each variable supplied, which I will show on
the rows. So for each variable in varlist, -makematrix- runs a
univariate command, and puts two or more scalars in the cells of
each row.

A bivariate command can be repeated, each time yielding two or
more scalars:

. makematrix, from(r(rho) r(p)) lhs(rep78-foreign) : spearman mpg

                     rho           p
       rep78   .30982668   .00957855
    headroom  -.48660171   .00001103
       trunk  -.64977398   3.759e-10
      weight  -.85755073   1.778e-22
      length   -.8314402   4.710e-20
        turn  -.75767499   5.548e-15
displacement  -.77126724   9.009e-16
  gear_ratio   .60982891   8.061e-09
     foreign   .36289624   .00148459

-makematrix- reasons in this way: The user wants two scalars,
which I will show in two columns. So I must run the command
specified in turn on the variable supplied. The option -lhs()- is
also specified, so that must be used to supply the other variable.
Whenever -lhs()- is specified, it specifies the rows of the
matrix.  That is, in this case, the rows show the results of
-spearman rep78 mpg ...  spearman foreign mpg-.  Notice how the
variables specified in -lhs()- appear on the left-hand side of the
varlist which -spearman- runs.  (-lhs()- also names the left-hand
side of the matrix, but that is a happy accident.) This is also
allowed:

. makematrix, from(r(rho) r(p)) rhs(rep78-foreign) : spearman mpg

In this case, the rows show the results of -spearman mpg rep78 ...
spearman mpg foreign-, and are exactly the same as in the previous
example. Again, whenever -rhs()- is specified, it specifies the
rows of the matrix.  Notice how the variables specified in -rhs()-
appear on the right-hand side of the varlist which spearman runs.
(By a small stretch, you can also think of it as naming the
right-hand side of the matrix, given that we could repeat the row
names on that side.) In other cases, which is used may well
matter:

. makematrix, from(e(r2) e(rmse) _b[_cons] _b[mpg]) lhs(rep78-foreign)
list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg

  +------------------------------------------------------+
  |              |    r2 |   rmse | _b[_cons] |  _b[mpg] |
  |--------------+-------+--------+-----------+----------|
  | rep78        | 0.162 |   0.91 |      1.96 |    0.068 |
  | headroom     | 0.171 |   0.78 |      4.28 |   -0.061 |
  | trunk        | 0.338 |   3.50 |     22.91 |   -0.430 |
  | weight       | 0.652 | 461.96 |   5328.76 | -108.432 |
  | length       | 0.633 |  13.58 |    253.16 |   -3.063 |
  | turn         | 0.517 |   3.08 |     51.30 |   -0.547 |
  | displacement | 0.498 |  65.52 |    435.85 |  -11.201 |
  | gear_ratio   | 0.380 |   0.36 |      1.98 |    0.049 |
  | foreign      | 0.155 |   0.43 |     -0.37 |    0.031 |
  +------------------------------------------------------+

. makematrix, from(e(r2) e(rmse) _b[_cons] _b) rhs(rep78-foreign)
list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg

  +--------------------------------------------------+
  |              |    r2 | rmse | _b[_cons] |     _b |
  |--------------+-------+------+-----------+--------|
  | rep78        | 0.162 | 5.41 |     13.17 |  2.384 |
  | headroom     | 0.171 | 5.30 |     29.77 | -2.830 |
  | trunk        | 0.338 | 4.74 |     32.12 | -0.787 |
  | weight       | 0.652 | 3.44 |     39.44 | -0.006 |
  | length       | 0.633 | 3.53 |     60.16 | -0.207 |
  | turn         | 0.517 | 4.05 |     58.80 | -0.946 |
  | displacement | 0.498 | 4.13 |     30.07 | -0.044 |
  | gear_ratio   | 0.380 | 4.59 |     -2.26 |  7.813 |
  | foreign      | 0.155 | 5.36 |     19.83 |  4.946 |
  +--------------------------------------------------+

The first series of regressions predicts -rep78 ... foreign- in turn
from -mpg-. The second series predicts -mpg- from -rep78 ...  foreign-
in turn. The r-square results will be the same, but not the root mean
square errors, or the intercepts or slopes.  Note that _b by itself
has
the interpretation of _b[row_variable]. -dp()- is a lazy
alternative to -format()- used to specify the number of decimal
places.

In fact -lhs()- and -rhs()- can be used to produce a series of
multivariate results.  Suppose we have -weightsq-, i.e. -weight^2-.

. gen weightsq = weight^2

. makematrix, from(e(r2) e(rmse)) lhs(mpg-trunk length-foreign)
list dp(3 2) sep(0) divider : regress weight weightsq

  +------------------------------+
  |              |    r2 |  rmse |
  |--------------+-------+-------|
  | mpg          | 0.672 |  3.36 |
  | rep78        | 0.222 |  0.89 |
  | headroom     | 0.236 |  0.75 |
  | trunk        | 0.457 |  3.20 |
  | length       | 0.900 |  7.12 |
  | turn         | 0.736 |  2.29 |
  | displacement | 0.826 | 38.90 |
  | gear_ratio   | 0.577 |  0.30 |
  | foreign      | 0.379 |  0.37 |
  +------------------------------+

This series predicts -mpg ... foreign- in turn from -weight- and
-weightsq-. When either -lhs()- or -rhs()- is specified they
define the varying rows, while the varlist supplied is fixed
for each run of the command.

There is one more nuance to be explained. Say you want a table of
sums for a set of variables. You might try

. makematrix, from(r(sum)): su head trunk length displacement
weight, meanonly

However, -makematrix- cannot distinguish between this and a
similar problem with a bivariate command, so it will attempt to
run -summarize- on all distinct pairs of variables. This will
succeed, except that what is left behind in -r(sum)- will be the
sum of the second of each pair of variables. What you will prefer
is a vector, and that is the option to specify:

. makematrix, from(r(sum)) vector: su head trunk length
displacement weight, meanonly

There's more, for which please see the help as usual.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index