
Title | Build customizable tables of correlation coefficients for complex survey data | |
Authors |
Mia Lv, StataCorp Jeff Pitblado, StataCorp |
In FAQ: How can I estimate correlation coefficients and their p-values for complex survey data?, we discuss how to estimate correlations on a complex survey dataset using svy: sem. In this FAQ, we will show you how to use the collect framework to create customizable tables for the estimated correlations and their p-values.
In the following example, we use the collect suite of commands to build customizable tables that look like the output tables produced by correlate and pwcorr:
webuse nhanes2l, clear * Declare the survey data characteristics such as sampling units, weights, strata, and finite population corrections svyset psu [pweight=finalwgt] ,strata(strata) svy: sem (<- bmi bpsystol tcresult), standardized collect clear * Collect estimation results reported in r(table) collect get r() * Get list of variables for looping to add tags for custom table layouts local vlist = e(oxvars) local k_vlist : list sizeof vlist forvalues i = 1/`k_vlist' { local v1 : word `i' of `vlist' * Add row and col tags for the diagonal elements of the estimated * correlation matrix, use result[_r_b] in -fortags()- to ignore * the missing-valued p-values collect addtags row[`v1'] col[`v1'], fortags(result[_r_b]#colname[var(`v1')]) forvalues j = `=`i'+1'/`k_vlist' { local v2 : word `j' of `vlist' * add row and col tags for the lower diagonal elements of * the estimated correlation matrix collect addtags row[`v2'] col[`v1'], fortags(colname[cov(`v1',`v2')]) } local label: variable label `v1' collect label levels row `v1' `"`label'"', modify collect label levels col `v1' `"`label'"', modify } * Apply the style etable to our table collect style use etable, replace collect style header row, level(label) collect style header col, level(label) * The first layout: Stack correlations on their p-values collect layout (row#result[_r_b _r_p]) (col#stars[value]) * The second layout: Show correlations with stars and stars note collect stars, shownote collect layout (row#result[_r_b]) (col#stars)
At the end of the above code, I organized the results into two different table layouts. In the first layout, the correlations are stacked on top of their p-values:
. collect layout (row#result[_r_b _r_p]) (col#stars[value]) Collection: default Rows: row#result[_r_b _r_p] Columns: col#stars[value] Table 1: 5 x 3
Body mass index (BMI) Systolic blood pressure Serum cholesterol (mg/dL) |
Body mass index (BMI) 1.000 |
Systolic blood pressure 0.367 1.000 |
0.00 |
Serum cholesterol (mg/dL) 0.201 0.264 1.000 |
0.00 0.00 |
In the second layout, stars for significant results are inserted after each correlation, and the star notes are displayed:
. collect stars, shownote . collect layout (row#result[_r_b]) (col#stars) Collection: default Rows: row#result[_r_b] Columns: col#stars Table 1: 3 x 5
Body mass index (BMI) Systolic blood pressure Serum cholesterol (mg/dL) |
Body mass index (BMI) 1.000 |
Systolic blood pressure 0.367 ** 1.000 |
Serum cholesterol (mg/dL) 0.201 ** 0.264 ** 1.000 |
In the above code, we apply the predefined style for etable to our customized table using the command
. collect style use etable, replace
This style is the system default for tables generated with the etable command. We apply this style because it looks nice and it already has rules for displaying significant stars (one star for p-values less than 0.05 and two stars for p-values less than 0.01). So there is no need to add stars using collect stars separately.
After this style is applied, the new star items are created and tagged with result[_r_b] and a new tag: stars[label]. All other existing items, such as those numbers tagged with result[_r_b] or result[_r_p], are tagged with stars[value].
This is why, when you specify the tag stars[value] in the first layout with
. collect layout (row#result[_r_b _r_p]) (col#stars[value])
the table does not include the stars.
However, the table displays the significant stars along with the correlations when we include both levels from the stars dimension (not specifying any levels means including all the levels when autolevels are not defined for this dimension):
. collect layout (row#result[_r_b]) (col#stars)
Why do we need to include the stars dimension in the second layout? Because for some combinations of the levels of row and col and result[_r_b], collect layout will not find a unique value. In other words, it will find multiple values (stars and correlation coefficient). When that happens, nothing will be displayed for those cells. For a more detailed introduction to tags and specifying layouts in collections, please refer to [TABLES] Collection principles.
Tables in collections can be exported to other file formats easily. Please see FAQ: What methods can we use to export a customizable table from Stata to another format? for more detailed information on how to export such tables.