[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

The output of -correlate- given a varlist of two or more variables is a matrix of correlations for every pair of variables in varlist. How could we produce an equivalent directly for -spearman-? We need to find out that -spearman- leaves a correlation behind in r(rho): . makematrix, from(r(rho)) : spearman head trunk length displacement weight headroom trunk length displacement weight headroom 1 trunk .67678924 1 length .53235996 .71907323 1 displacement .47845891 .57664675 .85248218 1 weight .52808385 .65644851 .94895697 .90538822 1 The result is displayed using -matrix list- and we will normally want to tidy up the presentation, say by . makematrix, from(r(rho)) format(%4.3f) : spearman head trunk length displacement weight headroom trunk length displacement weight headroom 1.000 trunk 0.677 1.000 length 0.532 0.719 1.000 displacement 0.478 0.577 0.852 1.000 weight 0.528 0.656 0.949 0.905 1.000 However, let us leave these details of presentation on one side. In this case, given a bivariate command, and a varlist, and a single result from which to compile the matrix, -makematrix- takes each pair of variables from varlist, runs a bivariate command for that pair, and puts a single result in the cell defined by each pair of variables. So both rows and columns are specified by varlist. Alternatively, we might want different sets of variables on the rows and the columns, perhaps specifying a submatrix of the full matrix. The option -cols()- can be used to specify variables to appear as columns. Say we did a principal component analysis of five variables and followed with calculation of scores: . pca head trunk length displacement weight . score score1-score5 . makematrix, from(r(rho)) cols(score?) : correlate head trunk length displacement weight score1 score2 score3 score4 score5 headroom .69579216 .65541006 .28995191 -.04724258 .00263525 trunk .84053038 .3144061 -.42608327 .11382425 .01243294 length .94323831 -.20350815 -.05828833 -.22445161 -.12292224 displacement .89424409 -.29085394 .19339097 .27602318 -.04628849 weight .93915804 -.28562389 .0409204 -.10426623 .15445146 Here the full correlation matrix of variables and scores, as would be produced by -correlate-, is 10 X 10, and the submatrix produced by -makematrix- is only 5 X 5. The default number of decimal places is clearly ridiculous, and we would normally want to work on the column headers. The matrix result can be left in memory as a named matrix, and then further manipulated: . makematrix R, from(r(rho)) cols(score?) : correlate head trunk length displacement weight . matrix colnames R = "score 1" "score 2" "score 3" "score 4" "score 5" . matrix li R, format(%4.3f) R[5,5] score 1 score 2 score 3 score 4 score 5 headroom 0.696 0.655 0.290 -0.047 0.003 trunk 0.841 0.314 -0.426 0.114 0.012 length 0.943 -0.204 -0.058 -0.224 -0.123 displacement 0.894 -0.291 0.193 0.276 -0.046 weight 0.939 -0.286 0.041 -0.104 0.154 Another application of the -cols()- option is perhaps more commonly desired: . makematrix , from(r(rho) r(p)) label cols(price) : spearman mpg-foreign rho p Mileage (mpg) -.55546596 7.272e-07 Repair Record 1978 .10275187 .40082135 Headroom (in) .1174198 .33661622 Trunk space (cu ft) .42395912 .00028325 Weight (lbs) .50135653 .00001143 Length (in) .50145304 .00001138 Turn Circle (ft) .32117803 .00712682 Displacement (cu in) .41612747 .00037625 Gear Ratio -.3053873 .01072089 Car type .08065421 .51002468 . makematrix , from(r(rho) r(p)) list label format(%4.3f %6.5f) sep(0) cols(price) : spearman mpg-foreign +-------------------------------------------+ | rho p | |-------------------------------------------| | Mileage (mpg) -0.555 0.00000 | | Repair Record 1978 0.103 0.40082 | | Headroom (in.) 0.117 0.33662 | | Trunk space (cu. ft.) 0.424 0.00028 | | Weight (lbs.) 0.501 0.00001 | | Length (in.) 0.501 0.00001 | | Turn Circle (ft.) 0.321 0.00713 | | Displacement (cu. in.) 0.416 0.00038 | | Gear Ratio -0.305 0.01072 | | Car type 0.081 0.51002 | +-------------------------------------------+ As this example shows, we can also ask for the results to be shown using the -list- command, which opens a wider range of presentation possibilities. The -label- option asks for variable labels to be shown, and the numeric variables can be assigned display formats. As this example also shows, we can show two or more scalar results from each command run. This is possible in various ways. A univariate command can be repeated, each time yielding two or more scalars: . makematrix, from(r(mean) r(sd) r(skewness)) : su head trunk length displacement weight, detail mean sd skewness headroom 2.9932432 .84599477 .14086508 trunk 13.756757 4.2774042 .02920342 length 187.93243 22.26634 -.04097455 displacement 197.2973 91.837219 .59165653 weight 3019.4595 777.19357 .14811637 . makematrix, from(r(mean) r(sd) r(skewness)) list format(%2.1f %2.1f %4.3f) sep(0) : su head trunk length displacement weight, detail +------------------------------------------+ | mean sd skewness | |------------------------------------------| | headroom 3.0 0.8 0.141 | | trunk 13.8 4.3 0.029 | | length 187.9 22.3 -0.041 | | displacement 197.3 91.8 0.592 | | weight 3019.5 777.2 0.148 | +------------------------------------------+ -makematrix- reasons in this way: The user wants three scalars, which I will show in three columns. So I must run the command specified in turn on each variable supplied, which I will show on the rows. So for each variable in varlist, -makematrix- runs a univariate command, and puts two or more scalars in the cells of each row. A bivariate command can be repeated, each time yielding two or more scalars: . makematrix, from(r(rho) r(p)) lhs(rep78-foreign) : spearman mpg rho p rep78 .30982668 .00957855 headroom -.48660171 .00001103 trunk -.64977398 3.759e-10 weight -.85755073 1.778e-22 length -.8314402 4.710e-20 turn -.75767499 5.548e-15 displacement -.77126724 9.009e-16 gear_ratio .60982891 8.061e-09 foreign .36289624 .00148459 -makematrix- reasons in this way: The user wants two scalars, which I will show in two columns. So I must run the command specified in turn on the variable supplied. The option -lhs()- is also specified, so that must be used to supply the other variable. Whenever -lhs()- is specified, it specifies the rows of the matrix. That is, in this case, the rows show the results of -spearman rep78 mpg ... spearman foreign mpg-. Notice how the variables specified in -lhs()- appear on the left-hand side of the varlist which -spearman- runs. (-lhs()- also names the left-hand side of the matrix, but that is a happy accident.) This is also allowed: . makematrix, from(r(rho) r(p)) rhs(rep78-foreign) : spearman mpg In this case, the rows show the results of -spearman mpg rep78 ... spearman mpg foreign-, and are exactly the same as in the previous example. Again, whenever -rhs()- is specified, it specifies the rows of the matrix. Notice how the variables specified in -rhs()- appear on the right-hand side of the varlist which spearman runs. (By a small stretch, you can also think of it as naming the right-hand side of the matrix, given that we could repeat the row names on that side.) In other cases, which is used may well matter: . makematrix, from(e(r2) e(rmse) _b[_cons] _b[mpg]) lhs(rep78-foreign) list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg +------------------------------------------------------+ | | r2 | rmse | _b[_cons] | _b[mpg] | |--------------+-------+--------+-----------+----------| | rep78 | 0.162 | 0.91 | 1.96 | 0.068 | | headroom | 0.171 | 0.78 | 4.28 | -0.061 | | trunk | 0.338 | 3.50 | 22.91 | -0.430 | | weight | 0.652 | 461.96 | 5328.76 | -108.432 | | length | 0.633 | 13.58 | 253.16 | -3.063 | | turn | 0.517 | 3.08 | 51.30 | -0.547 | | displacement | 0.498 | 65.52 | 435.85 | -11.201 | | gear_ratio | 0.380 | 0.36 | 1.98 | 0.049 | | foreign | 0.155 | 0.43 | -0.37 | 0.031 | +------------------------------------------------------+ . makematrix, from(e(r2) e(rmse) _b[_cons] _b) rhs(rep78-foreign) list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg +--------------------------------------------------+ | | r2 | rmse | _b[_cons] | _b | |--------------+-------+------+-----------+--------| | rep78 | 0.162 | 5.41 | 13.17 | 2.384 | | headroom | 0.171 | 5.30 | 29.77 | -2.830 | | trunk | 0.338 | 4.74 | 32.12 | -0.787 | | weight | 0.652 | 3.44 | 39.44 | -0.006 | | length | 0.633 | 3.53 | 60.16 | -0.207 | | turn | 0.517 | 4.05 | 58.80 | -0.946 | | displacement | 0.498 | 4.13 | 30.07 | -0.044 | | gear_ratio | 0.380 | 4.59 | -2.26 | 7.813 | | foreign | 0.155 | 5.36 | 19.83 | 4.946 | +--------------------------------------------------+ The first series of regressions predicts -rep78 ... foreign- in turn from -mpg-. The second series predicts -mpg- from -rep78 ... foreign- in turn. The r-square results will be the same, but not the root mean square errors, or the intercepts or slopes. Note that _b by itself has the interpretation of _b[row_variable]. -dp()- is a lazy alternative to -format()- used to specify the number of decimal places. In fact -lhs()- and -rhs()- can be used to produce a series of multivariate results. Suppose we have -weightsq-, i.e. -weight^2-. . gen weightsq = weight^2 . makematrix, from(e(r2) e(rmse)) lhs(mpg-trunk length-foreign) list dp(3 2) sep(0) divider : regress weight weightsq +------------------------------+ | | r2 | rmse | |--------------+-------+-------| | mpg | 0.672 | 3.36 | | rep78 | 0.222 | 0.89 | | headroom | 0.236 | 0.75 | | trunk | 0.457 | 3.20 | | length | 0.900 | 7.12 | | turn | 0.736 | 2.29 | | displacement | 0.826 | 38.90 | | gear_ratio | 0.577 | 0.30 | | foreign | 0.379 | 0.37 | +------------------------------+ This series predicts -mpg ... foreign- in turn from -weight- and -weightsq-. When either -lhs()- or -rhs()- is specified they define the varying rows, while the varlist supplied is fixed for each run of the command. There is one more nuance to be explained. Say you want a table of sums for a set of variables. You might try . makematrix, from(r(sum)): su head trunk length displacement weight, meanonly However, -makematrix- cannot distinguish between this and a similar problem with a bivariate command, so it will attempt to run -summarize- on all distinct pairs of variables. This will succeed, except that what is left behind in -r(sum)- will be the sum of the second of each pair of variables. What you will prefer is a vector, and that is the option to specify: . makematrix, from(r(sum)) vector: su head trunk length displacement weight, meanonly There's more, for which please see the help as usual. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: -makematrix- command available on SSC** - Next by Date:
**Re: st: How much can we trust Stata's non-linear solver(s)?** - Previous by thread:
**st: -makematrix- command available on SSC** - Next by thread:
**Re: st: How much can we trust Stata's non-linear solver(s)?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |