FAQ: How can I estimate correlation coefficients and their p-values for complex survey data?

Home / Resources & support / FAQs / Estimating correlations with survey data using svy: sem

How can I estimate correlation coefficients and their p-values for complex survey data?

Title		Estimating correlations with survey data using svy: sem
Authors		Mia Lv, StataCorp Jeff Pitblado, StataCorp

In Stata, the standard commands for computing correlations—such as correlate and pwcorr—do not support survey design data or allow the use of sampling weights (pweights). In modern Stata, we can now use svy: sem to directly estimate correlations between variables, along with their p-values, for complex survey datasets.

Here is a quick example:

webuse nhanes2l, clear

* Declare the survey data characteristics such as sampling units, weights, strata, and finite population corrections
svyset psu [pweight=finalwgt] ,strata(strata) 
svy: sem (<- bmi bpsystol tcresult), standardized

The output of the above svy: sem command is

. svy: sem (<- bmi bpsystol tcresult), standardized
(running sem on estimation sample)

Survey: Structural equation model                Number of obs   =      10,351
Number of strata = 31                            Population size = 117,157,513
Number of PSUs   = 62                            Design df       =          31

 
                            Linearized                                        
 Standardized   Coefficient  std. err.      t    P>|t|     [95% conf. interval]

     mean(bmi)     5.262723   .0618202    85.13   0.000      5.13664    5.388806
mean(bpsystol)      5.93213   .0824527    71.95   0.000     5.763967    6.100293
mean(tcresult)     4.398236   .0482051    91.24   0.000     4.299921    4.496551

      var(bmi)            1          .                             .           .
 var(bpsystol)            1          .                             .           .
 var(tcresult)            1          .                             .           .

      cov(bmi,                                                                  
     bpsystol)     .3667122   .0113745    32.24   0.000     .3435138    .3899106
      cov(bmi,                                                                  
     tcresult)     .2007772   .0124536    16.12   0.000      .175378    .2261764
 cov(bpsystol,                                                                  
     tcresult)     .2639261   .0112336    23.49   0.000     .2410151    .2868371

The third section of the table displays the estimated covariances (which, with the standardized option, are correlations) between each pair of variables. The P>|t| column provides p-values for a test of each estimate against zero separately. In this example, all three p-values are smaller than 0.001, indicating statistically significant correlation coefficients.

Why this code computes correlations

The sem command fits a structural equation model (SEM). When specifying a model with only exogenous variables on the right side of “<-” [for example, (<- height weight race)], you are asking Stata to estimate the covariance matrix among these variables.

The standardized option displays standardized values and transforms the covariances into correlations, so the output gives you the correlation coefficients directly.

Because this command includes the svy: prefix, Stata also incorporates the survey design information specified in the svyset command. This ensures that the correlations, their standard errors, and p-values are correctly adjusted for the complex survey design, including sampling weights, PSU, strata, etc.

Use estat framework, standardized to get a table of the correlations

After obtaining the correlations using svy: sem, you may wonder how to get a table of correlations that looks like the output tables produced by correlate and pwcorr. We can use estat framework, standardized to easily get such a table:

. estat framework, standardized
(model contains no latent variables)

Covariances of exogenous variables (standardized)


                  Observed                       
             Phi        bmi   bpsystol   tcresult
    Observed                                     
             bmi          1                      
        bpsystol   .3667122          1           
        tcresult   .2007772   .2639261          1


Means of exogenous variables (standardized)


                  Observed                       
           kappa        bmi   bpsystol   tcresult
            mean   5.262723    5.93213   4.398236

The first table in the output shows the correlation coefficients (standardized covariances).

This table is also saved in the stored matrix r(Phi) by the above command.

. matlist r(Phi)


               Observed                       
                     bmi   bpsystol   tcresult

Observed                                      
         bmi           1                      
    bpsystol    .3667122          1           
    tcresult    .2007772   .2639261          1

So you can either get a copy of this matrix for future use or export this matrix to Excel using putexcel:

. matrix m = r(Phi)

. putexcel C2 = `r(Phi)'

Alternatively, you can use the collect suite of commands to build a customizable table of the correlations. For more information, please see FAQ: How to build customizable tables of correlation coefficients for complex survey data?

How to compare two correlations

After estimating the correlations (standardized covariances) among your variables with svy: sem, you may want to test whether two correlations are statistically different from each other. For example, in medical research, people may want to compare whether body mass index (bmi) has a stronger correlation with systolic blood pressure (bpsystol) than with serum cholesterol level (tcresult). The estat stdize: test command allows you to do that by performing a hypothesis test on the estimated parameters from your SEM model.

We can run the following command to test that the null hypothesis that the correlation between bmi and bpsystol is equal to the correlation between bmi and tcresult in the population:

. estat stdize: test _b[cov(bmi,bpsystol)]= _b[cov(bmi,tcresult)]

Adjusted Wald test

 ( 1)  [/]cov(bmi,bpsystol) - [/]cov(bmi,tcresult) = 0

       F(  1,    31) =  110.33
            Prob > F =    0.0000

The test results show a significant difference between the two correlations. Therefore, we conclude that body mass index has a stronger correlation with systolic blood pressure than with serum cholesterol level.

We need to specify the estat stdize: prefix because the test command operates on the unstandardized parameter estimates by default even when we specify standardized with the svy: sem estimation command. However, when comparing correlations (which are standardized covariances), we need to use the standardized parameter estimates, not the unstandardized ones. estat stdize: temporarily replaces the parameter vector with the standardized estimates (correlations) so that we can perform the hypothesis test on the standardized covariances (correlations).

If you are unsure what labels you should use with test to refer to those coefficients, you can redo the svy: sem estimation by typing sem with the option coeflegend. Here is the command and its output:

. sem, coeflegend

Survey: Structural equation model                Number of obs   =      10,351
Number of strata = 31                            Population size = 117,157,513
Number of PSUs   = 62                            Design df       =          31


 
                        Coefficient  Legend                                        

             mean(bmi)     25.27584  _b[/mean(bmi)]                                 
        mean(bpsystol)     126.9458  _b[/mean(bpsystol)]                            
        mean(tcresult)     213.0977  _b[/mean(tcresult)]                            

              var(bmi)     23.06695  _b[/var(bmi)]                                  
         var(bpsystol)      457.947  _b[/var(bpsystol)]                             
         var(tcresult)     2347.472  _b[/var(tcresult)]                             

     cov(bmi,bpsystol)     37.69017  _b[/cov(bmi,bpsystol)]                         
     cov(bmi,tcresult)     46.72074  _b[/cov(bmi,tcresult)]                         
cov(bpsystol,tcresult)     273.6466  _b[/cov(bpsystol,tcresult)]

Now we can see the labels for each coefficient estimated by the previous svy: sem model. Then we can use the correct labels in the test command.

How can I estimate correlation coefficients and their p-values for complex survey data?

Why this code computes correlations

Use estat framework, standardized to get a table of the correlations

How to compare two correlations

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Linearized
Standardized		Coefficient std. err. t P>\|t\| [95% conf. interval]

mean(bmi)		5.262723 .0618202 85.13 0.000 5.13664 5.388806
mean(bpsystol)		5.93213 .0824527 71.95 0.000 5.763967 6.100293
mean(tcresult)		4.398236 .0482051 91.24 0.000 4.299921 4.496551

var(bmi)		1 . . .
var(bpsystol)		1 . . .
var(tcresult)		1 . . .

cov(bmi,
bpsystol)		.3667122 .0113745 32.24 0.000 .3435138 .3899106
cov(bmi,
tcresult)		.2007772 .0124536 16.12 0.000 .175378 .2261764
cov(bpsystol,
tcresult)		.2639261 .0112336 23.49 0.000 .2410151 .2868371

		Observed
	Phi	bmi bpsystol tcresult
	Observed
	bmi	1
	bpsystol	.3667122 1
	tcresult	.2007772 .2639261 1

		Observed
	kappa	bmi bpsystol tcresult
	mean	5.262723 5.93213 4.398236


		Coefficient Legend

mean(bmi)		25.27584 _b[/mean(bmi)]
mean(bpsystol)		126.9458 _b[/mean(bpsystol)]
mean(tcresult)		213.0977 _b[/mean(tcresult)]

var(bmi)		23.06695 _b[/var(bmi)]
var(bpsystol)		457.947 _b[/var(bpsystol)]
var(tcresult)		2347.472 _b[/var(tcresult)]

cov(bmi,bpsystol)		37.69017 _b[/cov(bmi,bpsystol)]
cov(bmi,tcresult)		46.72074 _b[/cov(bmi,tcresult)]
cov(bpsystol,tcresult)		273.6466 _b[/cov(bpsystol,tcresult)]

Stata/MP4 Annual License (download)

How can I estimate correlation coefficients and their p-values for complex survey data?

Why this code computes correlations

Use estat framework, standardized to get a table of the correlations

How to compare two correlations

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies