Home  /  Resources & support  /  FAQs  /  Build customizable tables of correlation coefficients for complex survey data

How to build customizable tables of correlation coefficients for complex survey data?

How to build a customized table to show the covariances obtained by sem

Title Build customizable tables of correlation coefficients for complex survey data
Authors Mia Lv, StataCorp
Jeff Pitblado, StataCorp

In FAQ: How can I estimate correlation coefficients and their p-values for complex survey data?, we discuss how to estimate correlations on a complex survey dataset using svy: sem and show you how to obtain the correlation matrix using estat framework, standardized. In this FAQ, we will focus on how to use the collect suite of commands to create customizable tables for the estimated correlations (or covariances) and their p-values.

In the following example, we use svy:sem to obtain the correlations bewteen the variables bmi, bpsystol, and tcresult, considering the complex survey design.

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt] ,strata(strata)


Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. svy: sem (<- bmi bpsystol tcresult), standardized
(running sem on estimation sample)


Survey: Structural equation model                Number of obs   =      10,351
Number of strata = 31                            Population size = 117,157,513
Number of PSUs   = 62                            Design df       =          31

Linearized
Standardized Coefficient std. err. t P>|t| [95% conf. interval]
mean(bmi) 5.262723 .0618202 85.13 0.000 5.13664 5.388806
mean(bpsystol) 5.93213 .0824527 71.95 0.000 5.763967 6.100293
mean(tcresult) 4.398236 .0482051 91.24 0.000 4.299921 4.496551
var(bmi) 1 . . .
var(bpsystol) 1 . . .
var(tcresult) 1 . . .
cov(bmi,
bpsystol) .3667122 .0113745 32.24 0.000 .3435138 .3899106
cov(bmi,
tcresult) .2007772 .0124536 16.12 0.000 .175378 .2261764
cov(bpsystol,
tcresult) .2639261 .0112336 23.49 0.000 .2410151 .2868371

We see that the correlations are listed in the third section of the above estimation table. Now, we use the collect suite of commands to build customizable tables that look like the output tables produced by correlate and pwcorr:

collect clear
* Collect estimation results reported in r(table) 
collect get r()

* Get list of variables for looping to add tags for custom table layouts
local vlist = e(oxvars)
local k_vlist : list sizeof vlist

forvalues i = 1/`k_vlist' {
    local v1 : word `i' of `vlist'
    * Add row and col tags for the diagonal elements of the estimated
    * correlation matrix, use result[_r_b] in -fortags()- to ignore
    * the missing-valued p-values
    collect addtags row[`v1'] col[`v1'], fortags(result[_r_b]#colname[var(`v1')])
    forvalues j = `=`i'+1'/`k_vlist' {
        local v2 : word `j' of `vlist'
        * add row and col tags for the lower diagonal elements of
        * the estimated correlation matrix  
        collect addtags row[`v2'] col[`v1'], fortags(colname[cov(`v1',`v2')])
    }


        local label: variable label `v1'
        collect label levels row `v1' `"`label'"', modify
        collect label levels col `v1' `"`label'"', modify
}


* Apply the style etable to our table
collect style use etable, replace

collect style header row, level(label)
collect style header col, level(label)

* The first layout: Stack correlations on their p-values
collect layout (row#result[_r_b _r_p]) (col#stars[value])

* The second layout: Show correlations with stars and stars note
collect stars, shownote
collect layout (row#result[_r_b]) (col#stars)

At the end of the above code, I organized the results into two different table layouts. In the first layout, the correlations are stacked on top of their p-values:

. collect layout (row#result[_r_b _r_p]) (col#stars[value])

Collection: default
      Rows: row#result[_r_b _r_p]
   Columns: col#stars[value]
   Table 1: 5 x 3

Body mass index (BMI) Systolic blood pressure Serum cholesterol (mg/dL)
Body mass index (BMI) 1.000
Systolic blood pressure 0.367 1.000
0.00
Serum cholesterol (mg/dL) 0.201 0.264 1.000
0.00 0.00

In the second layout, stars for significant results are inserted after each correlation, and the star notes are displayed:

. collect stars, shownote

. collect layout (row#result[_r_b]) (col#stars)

Collection: default
      Rows: row#result[_r_b]
   Columns: col#stars
   Table 1: 3 x 5

Body mass index (BMI) Systolic blood pressure Serum cholesterol (mg/dL)
Body mass index (BMI) 1.000
Systolic blood pressure 0.367 ** 1.000
Serum cholesterol (mg/dL) 0.201 ** 0.264 ** 1.000
** p>.01, * p>.05

In the above code, we apply the predefined style for etable to our customized table using the command

. collect style use etable, replace

This style is the system default for tables generated with the etable command. We apply this style because it looks nice and it already has rules for displaying significant stars (one star for p-values less than 0.05 and two stars for p-values less than 0.01). So there is no need to add stars using collect stars separately.

After this style is applied, the new star items are created and tagged with result[_r_b] and a new tag: stars[label]. All other existing items, such as those numbers tagged with result[_r_b] or result[_r_p], are tagged with stars[value].

This is why, when you specify the tag stars[value] in the first layout with

. collect layout (row#result[_r_b _r_p]) (col#stars[value])

the table does not include the stars.

However, the table displays the significant stars along with the correlations when we include both levels from the stars dimension (not specifying any levels means including all the levels when autolevels are not defined for this dimension):

. collect layout (row#result[_r_b]) (col#stars)

Why do we need to include the stars dimension in the second layout? Because for some combinations of the levels of row and col and result[_r_b], collect layout will not find a unique value. In other words, it will find multiple values (stars and correlation coefficient). When that happens, nothing will be displayed for those cells. For a more detailed introduction to tags and specifying layouts in collections, please refer to [TABLES] Collection principles.

Tables in collections can be exported to other file formats easily. Please see FAQ: What methods can we use to export a customizable table from Stata to another format? for more detailed information on how to export such tables.