Home  /  Resources & support  /  FAQs  /  How does dtable handle survey data?

## How to generate a table of descriptive statistics for survey data?

 Title How does dtable handle survey data? Author Mia Lv, StataCorp

If you are working with survey data that have been svyset previously, generating a table of descriptive statistics for these data is straightforward. Simply use the svy option with dtable. There is no need to respecify the survey weights with dtable. Then all the statistics are calculated using the specified survey weights as applicable, and all the tests are calculated using the full survey settings including clustering and stratification. In this FAQ, we will be discussing statistics and tests separately.

### Statistics

When you specify svy with dtable, the default sample frequency statistic is sum of the weights (sumw). If you wish to report the unweighted frequency instead, you can do so by specifying the option sample( , statistics(frequency)). For example, you can have

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt], strata(strata)
(output omitted)

. dtable, svy continuous(age , statistics(mean sd)) continuous(weight , statistics(p50))
factor(sex,statistics(fvfrequency fvproportion))

Summary

N                117,157,513
Age (years)  42.253 (15.502)
Weight (kg)           70.420
Sex
Male      56,159,480 0.479
Female    60,998,033 0.521


. dtable, svy continuous(age , statistics(mean sd)) continuous(weight , statistics(p50))
factor(sex,statistics(fvfrequency fvproportion)) sample(Frequency,statistics(frequency))

Summary

Frequency             10,351
Age (years)  42.253 (15.502)
Weight (kg)           70.420
Sex
Male      56,159,480 0.479
Female    60,998,033 0.521



We see that the first table reports the sum of the weights and the second one reports the sample size (frequency).

Statistics for continuous and factor variables are computed using the weights previously specified with svyset. This means that we can reproduce these statistics by specifying the weights with dtable and dropping the svy option.

. dtable [pweight=finalwgt], continuous(age , statistics(mean sd))
continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))

Summary

N                117,157,513
Age (years)  42.253 (15.502)
Weight (kg)           70.420
Sex
Male      56,159,480 0.479
Female    60,998,033 0.521



To see the detailed formulas used to calculate statistics when weights are applied, see Methods and formulas in [R] table.

On the other hand, if your goal is to report descriptive statistics for a subpopulation, you need to specify both the svy and subpop() options with dtable. And you can reproduce all the reported statistics by specifying the weights and if qualifier with dtable; the only exceptions are the variance and sd statistics because these have different formulas for subpopulation estimation.

For example, the following two commands will report identical results for all the statistics except variance and sd.

. dtable, svy subpop(if region==1) continuous(age, statistics(mean variance sd semean))
continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))

Summary

N                              24,237,893
Age (years) 43.185 239.608 (15.479) 0.355
Weight (kg)                        70.420
Sex
Male                   11,880,038 0.490
Female                 12,357,855 0.510

. dtable if region==1 [pweight=finalwgt],  continuous(age, statistics(mean variance sd semean))
continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))

Summary

N                              24,237,893
Age (years) 43.185 244.896 (15.649) 0.355
Weight (kg)                        70.420
Sex
Male                   11,880,038 0.490
Female                 12,357,855 0.510



The formula of subpopulation variance is documented in Methods and formulas in [R] dtable.

### Tests

Please note that the svy option changes the list of tests supported by dtable. For continuous variables, the Kruskal–Wallis rank test (kwallis) is not allowed with svy. As for factor variables, the following tests are disallowed with svy: Fisher's exact test (fisher), likelihood-ratio $$\chi^2$$ test (lrchi2), Goodman and Kruskal's gamma (gamma), Kendall's $$\tau$$ (kendall), and Cramér's V (cramer). Nevertheless, the survey-adjusted likelihood-ratio test (svylr), survey-adjusted Wald test (svywald), and survey-adjusted log-linear Wald test (svyllwald) are exclusively allowed with svy.

When the svy or subpop() option is specified with dtable, the tests for continuous variables are computed using the prefix svy: or svy, subpop(): with regress, poisson, or gsem. For factor variables, the tests are computed using the prefix svy: or subpop(): with tabulate twoway. Please refer to Methods and formulas in [R] dtable for details. Below, we demonstrate how to reproduce the test results for both continuous and factor variables.

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt], strata(strata)

Sampling weights: finalwgt
VCE: linearized
Single unit: missing
Strata 1: strata
Sampling unit 1: psu
FPC 1: <zero>

. dtable, svy subpop(if region==1) continuous(age , test(regress)) continuous(weight,
test(poisson))  factor(sex, test(svywald)) by(race,tests nototal)
note: using test regress across levels of race for age.
note: using test poisson across levels of race for weight.
note: using test svywald across levels of race for sex.

Race
White             Black           Other       Test

N           22,970,498 (94.8%) 1,112,539 (4.6%)  154,856 (0.6%)
Age (years)    43.285 (15.483)  41.626 (15.492) 39.625 (13.223) 0.617
Weight (kg)    71.494 (14.640)  75.437 (16.948) 56.621 (10.332) 0.010
Sex
Male      11,314,500 (49.3%)  499,951 (44.9%)  65,587 (42.4%) 0.079
Female    11,655,998 (50.7%)  612,588 (55.1%)  89,269 (57.6%)

. *reproduce the p-value for age

. quietly: svy, subpop(if region==1): regress age i.race

. testparm i.race

( 1)  2.race = 0
( 2)  3.race = 0

F(  2,     6) =    0.52
Prob > F =    0.6168

. *reproduce the p-value for weight

. quietly: svy, subpop(if region==1): poisson weight i.race

. testparm i.race

( 1)  [weight]2.race = 0
( 2)  [weight]3.race = 0

F(  2,     6) =   11.14
Prob > F =    0.0096

. *reproduce the p-value for sex

. svy, subpop(if region==1): tabulate sex race, wald
(running tabulate on estimation sample)

Number of strata =  7                             Number of obs   =      2,096
Number of PSUs   = 14                             Population size = 24,237,893
Subpop. no. obs =      2,096
Subpop. size    = 24,237,893
Design df       =          7

White  Black  Other  Total

Race
Sex

Male   .4668  .0206  .0027  .4901
Female   .4809  .0253  .0037  .5099

Total   .9477  .0459  .0064      1

Key: Cell proportion

Wald (Pearson):
Note: 24 strata omitted because they contain no subpopulation members.