Home  /  Resources & support  /  FAQs  /  How does dtable handle survey data?

How does dtable handle survey data?

How to generate a table of descriptive statistics for survey data?

Title   How does dtable handle survey data?
Author Mia Lv, StataCorp

If you are working with survey data that have been svyset previously, generating a table of descriptive statistics for these data is straightforward. Simply use the svy option with dtable. There is no need to respecify the survey weights with dtable. Then all the statistics are calculated using the specified survey weights as applicable, and all the tests are calculated using the full survey settings including clustering and stratification. In this FAQ, we will be discussing statistics and tests separately.

Statistics

When you specify svy with dtable, the default sample frequency statistic is sum of the weights (sumw). If you wish to report the unweighted frequency instead, you can do so by specifying the option sample( , statistics(frequency)). For example, you can have

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt], strata(strata)
(output omitted)

. dtable, svy continuous(age , statistics(mean sd)) continuous(weight , statistics(p50)) 
     factor(sex,statistics(fvfrequency fvproportion))

Summary
N 117,157,513
Age (years) 42.253 (15.502)
Weight (kg) 70.420
Sex
Male 56,159,480 0.479
Female 60,998,033 0.521
. dtable, svy continuous(age , statistics(mean sd)) continuous(weight , statistics(p50))
     factor(sex,statistics(fvfrequency fvproportion)) sample(Frequency,statistics(frequency))

Summary
Frequency 10,351
Age (years) 42.253 (15.502)
Weight (kg) 70.420
Sex
Male 56,159,480 0.479
Female 60,998,033 0.521

We see that the first table reports the sum of the weights and the second one reports the sample size (frequency).

Statistics for continuous and factor variables are computed using the weights previously specified with svyset. This means that we can reproduce these statistics by specifying the weights with dtable and dropping the svy option.

. dtable [pweight=finalwgt], continuous(age , statistics(mean sd)) 
     continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))

Summary
N 117,157,513
Age (years) 42.253 (15.502)
Weight (kg) 70.420
Sex
Male 56,159,480 0.479
Female 60,998,033 0.521

To see the detailed formulas used to calculate statistics when weights are applied, see Methods and formulas in [R] table.

On the other hand, if your goal is to report descriptive statistics for a subpopulation, you need to specify both the svy and subpop() options with dtable. And you can reproduce all the reported statistics by specifying the weights and if qualifier with dtable; the only exceptions are the variance and sd statistics because these have different formulas for subpopulation estimation.

For example, the following two commands will report identical results for all the statistics except variance and sd.

. dtable, svy subpop(if region==1) continuous(age, statistics(mean variance sd semean)) 
     continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))

Summary
N 24,237,893
Age (years) 43.185 239.608 (15.479) 0.355
Weight (kg) 70.420
Sex
Male 11,880,038 0.490
Female 12,357,855 0.510
. dtable if region==1 [pweight=finalwgt], continuous(age, statistics(mean variance sd semean)) continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))
Summary
N 24,237,893
Age (years) 43.185 244.896 (15.649) 0.355
Weight (kg) 70.420
Sex
Male 11,880,038 0.490
Female 12,357,855 0.510

The formula of subpopulation variance is documented in Methods and formulas in [R] dtable.

Tests

Please note that the svy option changes the list of tests supported by dtable. For continuous variables, the Kruskal–Wallis rank test (kwallis) is not allowed with svy. As for factor variables, the following tests are disallowed with svy: Fisher's exact test (fisher), likelihood-ratio \(\chi^2\) test (lrchi2), Goodman and Kruskal's gamma (gamma), Kendall's \(\tau\) (kendall), and Cramér's V (cramer). Nevertheless, the survey-adjusted likelihood-ratio test (svylr), survey-adjusted Wald test (svywald), and survey-adjusted log-linear Wald test (svyllwald) are exclusively allowed with svy.

When the svy or subpop() option is specified with dtable, the tests for continuous variables are computed using the prefix svy: or svy, subpop(): with regress, poisson, or gsem. For factor variables, the tests are computed using the prefix svy: or subpop(): with tabulate twoway. Please refer to Methods and formulas in [R] dtable for details. Below, we demonstrate how to reproduce the test results for both continuous and factor variables.

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt], strata(strata)

Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. dtable, svy subpop(if region==1) continuous(age , test(regress)) continuous(weight, 
     test(poisson))  factor(sex, test(svywald)) by(race,tests nototal)
note: using test regress across levels of race for age.
note: using test poisson across levels of race for weight.
note: using test svywald across levels of race for sex.

Race
White Black Other Test
N 22,970,498 (94.8%) 1,112,539 (4.6%) 154,856 (0.6%)
Age (years) 43.285 (15.483) 41.626 (15.492) 39.625 (13.223) 0.617
Weight (kg) 71.494 (14.640) 75.437 (16.948) 56.621 (10.332) 0.010
Sex
Male 11,314,500 (49.3%) 499,951 (44.9%) 65,587 (42.4%) 0.079
Female 11,655,998 (50.7%) 612,588 (55.1%) 89,269 (57.6%)
. *reproduce the p-value for age . quietly: svy, subpop(if region==1): regress age i.race . testparm i.race Adjusted Wald test ( 1) 2.race = 0 ( 2) 3.race = 0 F( 2, 6) = 0.52 Prob > F = 0.6168 . *reproduce the p-value for weight . quietly: svy, subpop(if region==1): poisson weight i.race . testparm i.race Adjusted Wald test ( 1) [weight]2.race = 0 ( 2) [weight]3.race = 0 F( 2, 6) = 11.14 Prob > F = 0.0096 . *reproduce the p-value for sex . svy, subpop(if region==1): tabulate sex race, wald (running tabulate on estimation sample) Number of strata = 7 Number of obs = 2,096 Number of PSUs = 14 Population size = 24,237,893 Subpop. no. obs = 2,096 Subpop. size = 24,237,893 Design df = 7 White Black Other Total
Race
Sex
Male .4668 .0206 .0027 .4901
Female .4809 .0253 .0037 .5099
Total .9477 .0459 .0064 1
Key: Cell proportion Wald (Pearson): Unadjusted chi2(2) = 9.2914 Adjusted F(2, 6) = 3.9820 P = 0.0793 Note: 24 strata omitted because they contain no subpopulation members.