Stata: Data Analysis and Statistical Software
   >> Home >> Products >> Capabilities >> Survey commands
order stataorder stata

Survey commands

Stata has a number of commands designed to handle the special requirements of complex survey data. The commands will handle any or all of the following survey-design features: probability sampling weights, stratification, multiple stages of cluster sampling, and poststratification. There are commands for estimating means, totals, ratios, and proportions; and commands for linear regression, logistic regression, probit models, and survey estimators for sampling designs; see the table below for a complete listing of svy commands.

Variance estimates are produced using one of the five variance estimation techniques: balanced repeated replication, the bootstrap, the jackknife, successive difference replication, and Taylor linearization.

The Stata estimation commands designed to handle the special requirements of complex survey data work with the svy prefix:

svy: biprobit Bivariate probit regression for survey data svy: ologit Ordered logistic regression for survey data
svy: clogit Conditional (fixed-effects) logistic regression for survey data svy: oprobit Ordered probit regression for survey data
svy: cloglog Complementary log-log regression for survey data svy: poisson Poisson regression for survey data
svy: cnsreg Constrained linear regression for survey data svy: probit Probit regression for survey data
svy: glm Generalized linear models for survey data svy: proportionEstimate proportions for survey data
svy: gnbreg Generalized negative binomial regression for survey data svy: ratio Estimate ratios for survey data
svy: heckman Heckman selection model for survey data svy: regress Linear regression for survey data
svy: heckprob Probit model with sample selection for survey data svy: scobit Skewed logistic regression for survey data
svy: hetprob Heteroskedastic probit regression for survey data svy: sem Structural equation modeling for survey data
svy: intreg Interval regression for survey data svy: slogit Stereotype logistic regression for survey data
svy: ivprobit Probit model with endogenous regressors for survey data svy: stcox Cox proportional hazards model for survey data
svy: ivregress Single-equation instrumental-variables regression for survey data svy: streg Parametric survival models for survey data
svy: ivtobit Tobit model with endogenous regressors for survey data svy: tnbreg Truncated negative binomial regression for survey data
svy: logistic Logistic regression for survey data, reporting odds ratios svy: tobit Tobit regression for survey data
svy: logit Logistic regression for survey data, reporting coefficients svy: total Estimate totals for survey data
svy: mean Estimate means for survey data svy: tpoisson Truncated Poisson regression for survey data
svy: mlogit Multinomial (polytomous) logistic regression for survey data svy: treatreg Treatment-effects regression for survey data
svy: mprobit Multinomial probit regression for survey data svy: truncreg Truncated regression for survey data
svy: nbreg Negative binomial regression for survey data svy: zinb Zero-inflated negative binomial regression for survey data
svy: nl Nonlinear least-squares estimation for survey data svy: zip Zero-inflated Poisson regression for survey data

Many other estimation commands in Stata also have features that make them suitable for certain limited survey designs. For example, Stata’s competing-risks regression routine (stcrreg) handles sampling weights properly when sampling weights are specified, and it also handles clustering.

Stata's xtmixed command for fitting multilevel linear models allows for both sampling weights and clustering. Sampling weights may be specified at all levels in your multilevel model, and thus, by necessity, weights need to be treated differently in xtmixed than in other estimation commands. Some caution on the part of the user is required; see section "Survey data" in [XT] xtmixed for details. Also see example of using xtmixed with survey data.

estat effects computes the design effects DEFF and DEFT, as well as misspecification effects MEFF and MEFT. The test command, used after a svy estimation command, computes adjusted Wald tests and Bonferroni tests for linear hypotheses (single or joint).

Here is an example of the use of the svy: mean command:

. webuse nhanes2

. svyset psu [pw=finalwgt], strata(strata)

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: strata
         SU 1: psu
        FPC 1: <zero>

. svy: mean weight
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      31       Number of obs    =      10351
Number of PSUs   =      62       Population size  =  117157513
                                 Design df        =         31

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      weight |   71.90064   .1654434      71.56321    72.23806
--------------------------------------------------------------

The svyset command, illustrated above, allows you to set the variables that contain the sampling weights, strata, and any PSU identifiers at the outset. These variables are remembered for subsequent commands and do not have to be reentered.

Estimating the difference between two subpopulation means can be done by running svy: mean with a over() option to produce subpopulation estimates and then running the command lincom:

. svy: mean weight, over(sex)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      31       Number of obs    =      10351
Number of PSUs   =      62       Population size  =  117157513
                                 Design df        =         31

         Male: sex = Male
       Female: sex = Female

--------------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
weight       |
        Male |   78.62789   .2097761      78.20004    79.05573
      Female |   65.70701    .266384      65.16372    66.25031
--------------------------------------------------------------

The svy: mean, svy: prop, svy: ratio, and svy: total commands produce estimates for multiple subpopulations:

. svy: mean weight, over(sex race)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      31       Number of obs    =      10351
Number of PSUs   =      62       Population size  =  117157513
                                 Design df        =         31

         Over: sex race
    _subpop_1: Male White
    _subpop_2: Male Black
    _subpop_3: Male Other
    _subpop_4: Female White
    _subpop_5: Female Black
    _subpop_6: Female Other

--------------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
weight       |
   _subpop_1 |   78.98862   .2125203      78.55518    79.42206
   _subpop_2 |     78.324   .8476215      76.59526    80.05273
   _subpop_3 |   68.16404   1.811668      64.46912    71.85896
   _subpop_4 |   65.10844   .2926873       64.5115    65.70538
   _subpop_5 |   72.38252   1.059851      70.22094     74.5441
   _subpop_6 |   59.56941   1.325068      56.86692    62.27191
--------------------------------------------------------------

Use estat effects to report DEFF and DEFT.

. estat effects

         Over: sex race
    _subpop_1: Male White
    _subpop_2: Male Black
    _subpop_3: Male Other
    _subpop_4: Female White
    _subpop_5: Female Black
    _subpop_6: Female Other

----------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.       DEFF      DEFT
-------------+--------------------------------------------
weight       |
   _subpop_1 |   78.98862   .2125203     1.15287   1.07372
   _subpop_2 |     78.324   .8476215     1.34608   1.16021
   _subpop_3 |   68.16404   1.811668     2.08964   1.44556
   _subpop_4 |   65.10844   .2926873     2.09219   1.44644
   _subpop_5 |   72.38252   1.059851     1.93387   1.39064
   _subpop_6 |   59.56941   1.325068     1.55682   1.24772
----------------------------------------------------------

Use estat size to report the number of observations belonging to each subpopulation and estimates of the subpopulation size.

    . estat size

         Over: sex race
    _subpop_1: Male White
    _subpop_2: Male Black
    _subpop_3: Male Other
    _subpop_4: Female White
    _subpop_5: Female Black
    _subpop_6: Female Other

----------------------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.              Obs            Size
-------------+--------------------------------------------------------
weight       |
   _subpop_1 |   78.98862   .2125203              4312        49504800
   _subpop_2 |     78.324   .8476215               500         5096044
   _subpop_3 |   68.16404   1.811668               103         1558636
   _subpop_4 |   65.10844   .2926873              4753        53494749
   _subpop_5 |   72.38252   1.059851               586         6093192
   _subpop_6 |   59.56941   1.325068                97         1410092
----------------------------------------------------------------------

You can fit linear regressions, logistic regressions, and probit models using svy estimators. Shown below is an example of svy: logit, which fits logistic regressions for survey data.

webuse nhanes2d

. svy: logit highbp height weight age c.age#c.age female black
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        31                 Number of obs      =      10351
Number of PSUs     =        62                 Population size    =  117157513
                                               Design df          =         31
                                               F(   6,     26)    =      87.70
                                               Prob > F           =     0.0000

------------------------------------------------------------------------------
             |             Linearized
      highbp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |  -.0325996   .0058727    -5.55   0.000    -.0445771   -.0206222
      weight |    .049074   .0031966    15.35   0.000     .0425545    .0555936
         age |   .1541151   .0208709     7.38   0.000     .1115486    .1966815
             |
 c.age#c.age |  -.0010746   .0002025    -5.31   0.000    -.0014877   -.0006616
             |
      female |   -.356497   .0885354    -4.03   0.000     -.537066   -.1759279
       black |   .3429301   .1409005     2.43   0.021     .0555615    .6302986
       _cons |   -4.89574   1.159135    -4.22   0.000    -7.259813   -2.531668
------------------------------------------------------------------------------

svy: logit can display estimates as coefficients or as odds ratios. Below we redisplay the previous model, requesting that the estimates be expressed as odds ratios.

. svy: logit, or

Survey: Logistic regression

Number of strata   =        31                 Number of obs      =      10351
Number of PSUs     =        62                 Population size    =  117157513
                                               Design df          =         31
                                               F(   6,     26)    =      87.70
                                               Prob > F           =     0.0000

------------------------------------------------------------------------------
             |             Linearized
      highbp | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |    .967926   .0056843    -5.55   0.000     .9564019     .979589
      weight |   1.050298   .0033574    15.35   0.000     1.043473    1.057168
         age |   1.166625   .0243485     7.38   0.000     1.118008    1.217356
             |
 c.age#c.age |    .998926   .0002023    -5.31   0.000     .9985135    .9993386
             |
      female |   .7001246   .0619858    -4.03   0.000     .5844605    .8386784
       black |    1.40907   .1985388     2.43   0.021     1.057134    1.878171
------------------------------------------------------------------------------

After running a logistic regression, you can use lincom to compute odds ratios for any covariate group relative to another.

. lincom female + black, or

 ( 1)  [highbp]female + [highbp]black = 0

------------------------------------------------------------------------------
      highbp | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .9865247   .1631648    -0.08   0.935     .7040616    1.382309
------------------------------------------------------------------------------

You can also fit linear regressions, logistic regressions, and probit models for a subpopulation:

. svy, subpop(black): logistic highbp age female
(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =        30                 Number of obs      =      10013
Number of PSUs     =        60                 Population size    =  113415086
                                               Subpop. no. of obs =       1086
                                               Subpop. size       =   11189236
                                               Design df          =         30
                                               F(   2,     29)    =      41.92
                                               Prob > F           =     0.0000

------------------------------------------------------------------------------
             |             Linearized
      highbp | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.047957   .0053211     9.23   0.000     1.037146    1.058881
      female |   .9660029   .1419876    -0.24   0.816     .7155019    1.304206
------------------------------------------------------------------------------
Note: 1 stratum omitted because it contains no subpopulation members.

Survey data require some special data management. The svydescribe command can be used to examine the design structure of the dataset. It can also be used to see the number of missing and nonmissing observations per stratum (or optionally per stage) for one or more variables.

    . svydescribe hdresult

Survey: Describing stage 1 sampling units

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: strata
         SU 1: psu
        FPC 1: 

                             #Obs with  #Obs with     #Obs per included Unit
           #Units    #Units   complete  missing   ----------------------------
Stratum   included  omitted     data      data      min       mean      max   
--------  --------  --------  --------  --------  --------  --------  --------
       1         1*        1       114       266       114     114.0       114
       2         1*        1        98        87        98      98.0        98
       3         2         0       277        71       116     138.5       161
       4         2         0       340       120       160     170.0       180
       5         2         0       173        79        81      86.5        92
       6         2         0       255        43       116     127.5       139
       7         2         0       409        67       191     204.5       218
       8         2         0       299        39       129     149.5       170
       9         2         0       218        26        85     109.0       133
      10         2         0       233        29       103     116.5       130
      11         2         0       238        37        97     119.0       141
      12         2         0       275        39       121     137.5       154
      13         2         0       297        45       123     148.5       174
      14         2         0       355        50       167     177.5       188
      15         2         0       329        51       151     164.5       178
      16         2         0       280        56       134     140.0       146
      17         2         0       352        41       155     176.0       197
      18         2         0       335        24       135     167.5       200
      20         2         0       240        45        95     120.0       145
      21         2         0       198        16        91      99.0       107
      22         2         0       263        38       116     131.5       147
      23         2         0       304        37       143     152.0       161
      24         2         0       388        50       182     194.0       206
      25         2         0       239        17       106     119.5       133
      26         2         0       240        21       119     120.0       121
      27         2         0       259        24       127     129.5       132
      28         2         0       284        15       131     142.0       153
      29         2         0       440        63       193     220.0       247
      30         2         0       326        39       147     163.0       179
      31         2         0       279        29       121     139.5       158
      32         2         0       383        67       180     191.5       203
--------  --------  --------  --------  --------  --------  --------  --------
      31        60         2      8720      1631        81     145.3       247
                              ------------------
                                      10351

See New in Stata 12 for more about what was added in Stata Release 12.

Bookmark and Share 
Stata 12
Overview: Why use Stata?
Stata/MP
Capabilities
Overview
Sample session
User-written commands
New in Stata 12
Supported platforms
Which Stata?
Technical support
User comments
Like us on Facebook Follow us on Twitter Follow us on LinkedIn Google+ Watch us on YouTube
Follow us
© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index   |   View mobile site