Home  /  Resources & support  /  FAQs  /  Build customizable tables of correlation coefficients for complex survey data

How to build customizable tables of correlation coefficients for complex survey data?

Title   Build customizable tables of correlation coefficients for complex survey data
Authors Mia Lv, StataCorp
Jeff Pitblado, StataCorp

In FAQ: How can I estimate correlation coefficients and their p-values for complex survey data?, we discuss how to estimate correlations on a complex survey dataset using svy: sem. In this FAQ, we will show you how to use the collect framework to create customizable tables for the estimated correlations and their p-values.

In the following example, we use the collect suite of commands to build customizable tables that look like the output tables produced by correlate and pwcorr:

webuse nhanes2l, clear

* Declare the survey data characteristics such as sampling units, weights, strata, and finite population corrections
svyset psu [pweight=finalwgt] ,strata(strata) 
svy: sem (<- bmi bpsystol tcresult), standardized

collect clear
* Collect estimation results reported in r(table) 
collect get r()

* Get list of variables for looping to add tags for custom table layouts
local vlist = e(oxvars)
local k_vlist : list sizeof vlist

forvalues i = 1/`k_vlist' {
    local v1 : word `i' of `vlist'
    * Add row and col tags for the diagonal elements of the estimated
    * correlation matrix, use result[_r_b] in -fortags()- to ignore
    * the missing-valued p-values
    collect addtags row[`v1'] col[`v1'], fortags(result[_r_b]#colname[var(`v1')])
    forvalues j = `=`i'+1'/`k_vlist' {
        local v2 : word `j' of `vlist'
        * add row and col tags for the lower diagonal elements of
        * the estimated correlation matrix  
        collect addtags row[`v2'] col[`v1'], fortags(colname[cov(`v1',`v2')])
    }

        local label: variable label `v1'
        collect label levels row `v1' `"`label'"', modify
        collect label levels col `v1' `"`label'"', modify
}

* Apply the style etable to our table
collect style use etable, replace

collect style header row, level(label)
collect style header col, level(label)

* The first layout: Stack correlations on their p-values
collect layout (row#result[_r_b _r_p]) (col#stars[value])

* The second layout: Show correlations with stars and stars note
collect stars, shownote
collect layout (row#result[_r_b]) (col#stars)

At the end of the above code, I organized the results into two different table layouts. In the first layout, the correlations are stacked on top of their p-values:

. collect layout (row#result[_r_b _r_p]) (col#stars[value])

Collection: default
      Rows: row#result[_r_b _r_p]
   Columns: col#stars[value]
   Table 1: 5 x 3

Body mass index (BMI) Systolic blood pressure Serum cholesterol (mg/dL)
Body mass index (BMI) 1.000
Systolic blood pressure 0.367 1.000
0.00
Serum cholesterol (mg/dL) 0.201 0.264 1.000
0.00 0.00

In the second layout, stars for significant results are inserted after each correlation, and the star notes are displayed:

. collect stars, shownote

. collect layout (row#result[_r_b]) (col#stars)

Collection: default
      Rows: row#result[_r_b]
   Columns: col#stars
   Table 1: 3 x 5

Body mass index (BMI) Systolic blood pressure Serum cholesterol (mg/dL)
Body mass index (BMI) 1.000
Systolic blood pressure 0.367 ** 1.000
Serum cholesterol (mg/dL) 0.201 ** 0.264 ** 1.000
** p>.01, * p>.05

In the above code, we apply the predefined style for etable to our customized table using the command

. collect style use etable, replace

This style is the system default for tables generated with the etable command. We apply this style because it looks nice and it already has rules for displaying significant stars (one star for p-values less than 0.05 and two stars for p-values less than 0.01). So there is no need to add stars using collect stars separately.

After this style is applied, the new star items are created and tagged with result[_r_b] and a new tag: stars[label]. All other existing items, such as those numbers tagged with result[_r_b] or result[_r_p], are tagged with stars[value].

This is why, when you specify the tag stars[value] in the first layout with

. collect layout (row#result[_r_b _r_p]) (col#stars[value])

the table does not include the stars.

However, the table displays the significant stars along with the correlations when we include both levels from the stars dimension (not specifying any levels means including all the levels when autolevels are not defined for this dimension):

. collect layout (row#result[_r_b]) (col#stars)

Why do we need to include the stars dimension in the second layout? Because for some combinations of the levels of row and col and result[_r_b], collect layout will not find a unique value. In other words, it will find multiple values (stars and correlation coefficient). When that happens, nothing will be displayed for those cells. For a more detailed introduction to tags and specifying layouts in collections, please refer to [TABLES] Collection principles.

Tables in collections can be exported to other file formats easily. Please see FAQ: What methods can we use to export a customizable table from Stata to another format? for more detailed information on how to export such tables.