Survey features

Order

<- See Stata's other features

Stata has a number of features designed to handle the special requirements of complex survey data. The survey features will handle probability sampling weights, multiple stages of cluster sampling, stage-level sampling weights, stratification, and poststratification.

Variance estimates are produced using one of the five variance estimation techniques: balanced repeated replication, the bootstrap, the jackknife, successive difference replication, and Taylor linearization. See [SVY] variance estimation for an overview of these techniques.

Many different types of estimation can be performed using Stata's survey facilities:

Descriptive statistics

mean	Estimate means
proportion	Estimate proportions
ratio	Estimate ratios
tabulate (oneway)	One-way tables for survey data
tabulate (twoway)	Two-way tables for survey data
total	Estimate totals

Linear regression models

churdle	Cragg hurdle regression
cnsreg	Constrained linear regression
eintreg	Extended interval regression
eregress	Extended linear regression
etregress	Linear regression with endogenous treatment effects
glm	Generalized linear models
hetregress	Heteroskedastic linear regression
intreg	Interval regression
nl	Nonlinear least-squares estimation
regress	Linear regression
tobit	Tobit regression
truncreg	Truncated regression

Structural equation models

sem	Structural equation model estimation command
gsem	Generalized structural equation model estimation command

Survival-data regression models

stcox	Cox proportional hazards model
stintreg	Parametric models for interval-censored survival-time data
streg	Parametric survival models

Binary-response regression models

biprobit	Bivariate probit regression
cloglog	Complementary log-log regression
eprobit	Extended probit regression
hetprobit	Heteroskedastic probit model
logistic	Logistic regression, reporting odds ratios
logit	Logistic regression, reporting coefficients
probit	Probit regression
scobit	Skewed logistic regression

Discrete-response regression models

clogit	Conditional (fixed-effects) logistic regression
cmmixlogit	Mixed logit choice model
cmxtmixlogit	Panel-data mixed logit choice model
eoprobit	Extended ordered probit regression
hetoprobit	Heteroskedastic ordered probit regression
mlogit	Multinomial (polytomous) logistic regression
mprobit	Multinomial probit regression
ologit	Ordered logistic regression
oprobit	Ordered probit regression
slogit	Stereotype logistic regression
ziologit	Zero-inflated ordered logit regression
zioprobit	Zero-inflated ordered probit regression

Fractional-response regression models

betareg	Beta regression
fracreg	Fractional response regression

Poisson regression models

cpoisson	Censored Poisson regression
etpoisson	Poisson regression with endogenous treatment effects
gnbreg	Generalized negative binomial regression in [R] nbreg
nbreg	Negative binomial regression
poisson	Poisson regression
tnbreg	Truncated negative binomial regression
tpoisson	Truncated Poisson regression
zinb	Zero-inflated negative binomial regression
zip	Zero-inflated Poisson regression

Instrumental-variables regression models

ivprobit	Probit model continuous endogenous covariates
ivregress	Single-equation instrumental-variables regression
ivtobit	Tobit model with continuous endogenous covariates

Regression models with selection

heckman	Heckman selection model
heckoprobit	Ordered probit model with sample selection
heckpoisson	Poisson regression with sample selection
heckprobit	Probit model with sample selection

Longitudinal/panel-data regression models

xtmlogit	Fixed-effects and random-effects multinomial logit models

Multilevel mixed-effects models

mecloglog	Multilevel mixed-effects complementary log-log regression
meglm	Multilevel mixed-effects generalized linear model
meintreg	Multilevel mixed-effects interval regression
melogit	Multilevel mixed-effects logistic regression
menbreg	Multilevel mixed-effects negative binomial regression
meologit	Multilevel mixed-effects ordered logistic regression
meoprobit	Multilevel mixed-effects ordered probit regression
mepoisson	Multilevel mixed-effects Poisson regression
meprobit	Multilevel mixed-effects probit regression
mestreg	Multilevel mixed-effects parametric survival models
metobit	Multilevel mixed-effects tobit regression

Finite mixture models

fmm: betareg	Finite mixtures of beta regression models
fmm: cloglog	Finite mixtures of complementary log-log regression models
fmm: glm	Finite mixtures of generalized linear regression models
fmm: intreg	Finite mixtures of interval regression models
fmm: ivregress	Finite mixtures of linear regression models with endogenous covariates
fmm: logit	Finite mixtures of logistic regression models
fmm: mlogit	Finite mixtures of multinomial (polytomous) logistic regression models
fmm: nbreg	Finite mixtures of negative binomial regression models
fmm: ologit	Finite mixtures of ordered logistic regression models
fmm: oprobit	Finite mixtures of ordered probit regression models
fmm: pointmass	Finite mixtures models with a density mass at a single point
fmm: poisson	Finite mixtures of Poisson regression models
fmm: probit	Finite mixtures of probit regression models
fmm: regress	Finite mixtures of linear regression models
fmm: streg	Finite mixtures of parametric survival models
fmm: tobit	Finite mixtures of tobit regression models
fmm: tpoisson	Finite mixtures of truncated Poisson regression models
fmm: truncreg	Finite mixtures of truncated linear regression models

Item response theory

irt 1pl	One-parameter logistic model
irt 2pl	Two-parameter logistic model
irt 3pl	Three-parameter logistic model
irt grm	Graded response model
irt nrm	Nominal response model
irt pcm	Partial credit model
irt rsm	Rating scale model
irt hybrid	Hybrid IRT models

Many other estimation features in Stata are suitable for certain limited survey designs. For example, Stata’s competing-risks regression routine (stcrreg) handles sampling weights properly when sampling weights are specified, and it also handles clustering.

Stata's mixed for fitting multilevel linear models allows for both sampling weights and clustering. Sampling weights may be specified at all levels in your multilevel model. Some caution on the part of the user is required; see section Survey data in [ME] mixed for details. Also see example of using mixed with survey data.

estat effects computes the design effects DEFF and DEFT, as well as misspecification effects MEFF and MEFT. test, used after svy, computes adjusted Wald tests and Bonferroni tests for linear hypotheses (single or joint).

Here is an example of the use of svy: mean:

. webuse nhanes2

. svyset psu [pw=finalwgt], strata(strata)

Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. svy: mean weight
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31            Number of obs   =      10,351
Number of PSUs   = 62            Population size = 117,157,513
                                 Design df       =          31





                           Linearized			 
                     Mean   std. err.     [95% conf. interval]
   
      weight     71.90064   .1654434      71.56321    72.23806

svyset, illustrated above, allows you to set the variables that contain the sampling weights, strata, and any PSU identifiers at the outset. These variables are remembered for subsequent commands and do not have to be reentered.

Estimating the difference between two subpopulation means can be done by running svy: mean with an over() option to produce subpopulation estimates and then running lincom:

. svy: mean weight, over(sex)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31            Number of obs   =      10,351
Number of PSUs   = 62            Population size = 117,157,513
                                 Design df       =          31




                           Linearized			 
                     Mean   std. err.     [95% conf. interval]
   
c.weight@sex  
       Male      78.62789   .2097761      78.20004    79.05573
     Female      65.70701    .266384      65.16372    66.25031

svy: mean, svy: prop, svy: ratio, and svy: total produce estimates for multiple subpopulations:

. svy: mean weight, over(sex race)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31                 Number of obs   =      10,351
Number of PSUs   = 62                 Population size = 117,157,513
                                      Design df       =          31




                           Linearized			 
                     Mean   std. err.     [95% conf. interval]
   
c.weight@sex#race  
      Male#White      78.98862   .2125203      78.55518    79.42206
      Male#Black        78.324   .8476215      76.59526    80.05273
      Male#Other      68.16404   1.811668      64.46912    71.85896
    Female#White      65.10844   .2926873       64.5115    65.70538
    Female#Black      72.38252   1.059851      70.22094     74.5441
    Female#Other      59.56941   1.325068      56.86692    62.27191

Use estat effects to report DEFF and DEFT.

. estat effects





                           Linearized		     
        Over         Mean   std. err.       DEFF      DEFT
   
          c.  
      weight@  
    sex#race  
 Male#White      78.98862   .2125203     1.15287   1.07372
 Male#Black        78.324   .8476215     1.34608   1.16021
 Male#Other      68.16404   1.811668     2.08964   1.44556
     Female #  
      White      65.10844   .2926873     2.09219   1.44644
     Female #  
      Black      72.38252   1.059851     1.93387   1.39064
     Female #  
      Other      59.56941   1.325068     1.55682   1.24772

Use estat size to report the number of observations belonging to each subpopulation and estimates of the subpopulation size.

. estat size





                           Linearized				 
        Over         Mean   std. err.              Obs            Size
   
          c.  
      weight@  
    sex#race  
 Male#White      78.98862   .2125203             4,312      49,504,800
 Male#Black        78.324   .8476215               500       5,096,044
 Male#Other      68.16404   1.811668               103       1,558,636
     Female #  
      White      65.10844   .2926873             4,753      53,494,749
     Female #  
      Black      72.38252   1.059851               586       6,093,192
     Female #  
      Other      59.56941   1.325068                97       1,410,092

You can fit a wide variety of models using svy estimators (see the tables above for a list of available commands). Shown below is an example of svy: logit, which fits logistic regressions for survey data.

. webuse nhanes2d

. svy: logit highbp height weight age c.age#c.age female black
(running logit on estimation sample)

Survey: Logistic regression

Number of strata = 31                            Number of obs   =      10,351
Number of PSUs   = 62                            Population size = 117,157,513
                                                 Design df       =          31
                                                 F(6, 26)        =      231.75
                                                 Prob > F        =      0.0000




                           Linearized					 
      highbp   Coefficient  std. err.      t    P>|t|     [95% conf. interval]
   
      height    -.0345643   .0053121    -6.51   0.000    -.0453985   -.0237301
      weight      .051004   .0025292    20.17   0.000     .0458457    .0561622
         age     .0554544   .0127859     4.34   0.000     .0293774    .0815314
              
 c.age#c.age    -.0000676   .0001385    -0.49   0.629    -.0003502    .0002149
              
      female    -.4758698   .0561318    -8.48   0.000    -.5903513   -.3613882
       black      .338201   .1075191     3.15   0.004     .1189143    .5574877
       _cons    -.5140351   .8747001    -0.59   0.561    -2.297998    1.269928

svy: logit can display estimates as coefficients or as odds ratios. Below we redisplay the previous model, requesting that the estimates be expressed as odds ratios.

. svy: logit, or

Survey: Logistic regression

Number of strata = 31                            Number of obs   =      10,351
Number of PSUs   = 62                            Population size = 117,157,513
                                                 Design df       =          31
                                                 F(6, 26)        =      231.75
                                                 Prob > F        =      0.0000




                           Linearized					 
      highbp   Odds ratio   std. err.      t    P>|t|     [95% conf. interval
   
      height     .9660262   .0051317    -6.51   0.000     .9556166    .9765492
      weight     1.052327   .0026615    20.17   0.000     1.046913    1.057769
         age     1.057021    .013515     4.34   0.000     1.029813    1.084947
              
 c.age#c.age     .9999324   .0001385    -0.49   0.629     .9996499    1.000215
              
      female     .6213444   .0348772    -8.48   0.000     .5541326    .6967085
       black     1.402422   .1507872     3.15   0.004     1.126273     1.74628
       _cons     .5980774   .5231384    -0.59   0.561     .1004598    3.560595

After running a logistic regression, you can use lincom to compute odds ratios for any covariate group relative to another.

. lincom female + black, or

 ( 1)  [highbp]female + [highbp]black = 0




      highbp   Odds ratio   Std. err.      t    P>|t|     [95% conf. interval]
   
         (1)     .8713873   .1233177    -0.97   0.338     .6529215    1.162951

You can also fit regression models for a subpopulation:

. svy, subpop(black): logistic highbp age female
(running logistic on estimation sample)

Survey: Logistic regression

Number of strata = 30                            Number of obs   =      10,013
Number of PSUs   = 60                            Population size = 113,415,086
                                                 Subpop. no. obs =       1,086
                                                 Subpop. size    =  11,189,236
                                                 Design df       =          30
                                                 F(2, 29)        =       83.52
                                                 Prob > F        =      0.0000


  

                           Linearized					 
      highbp   Odds ratio   std. err.      t    P>|t|     [95% conf. interval]
   
         age     1.060226   .0047619    13.02   0.000     1.050546    1.069996
      female     .8280475   .1063299    -1.47   0.152     .6370331    1.076338
       _cons     .0791591   .0185411   -10.83   0.000     .0490631    .1277163

Note: 1 stratum omitted because it contains no subpopulation members.

Survey data require some special data management. svydescribe can be used to examine the design structure of the dataset. It can also be used to see the number of missing and nonmissing observations per stratum (or optionally per stage) for one or more variables.

. svydescribe hdresult

Survey: Describing stage 1 sampling units

Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: 

                              Number of obs with
             Number of units  complete   missing       # obs per included unit
 Stratum  included   omitted      data      data       Min      Mean       Max

                                                                              
       1         1*        1       114       266       114     114.0       114
       2         1*        1        98        87        98      98.0        98
       3         2         0       277        71       116     138.5       161
       4         2         0       340       120       160     170.0       180
       5         2         0       173        79        81      86.5        92
       6         2         0       255        43       116     127.5       139
       7         2         0       409        67       191     204.5       218
       8         2         0       299        39       129     149.5       170
       9         2         0       218        26        85     109.0       133
      10         2         0       233        29       103     116.5       130
      11         2         0       238        37        97     119.0       141
      12         2         0       275        39       121     137.5       154
      13         2         0       297        45       123     148.5       174
      14         2         0       355        50       167     177.5       188
      15         2         0       329        51       151     164.5       178
      16         2         0       280        56       134     140.0       146
      17         2         0       352        41       155     176.0       197
      18         2         0       335        24       135     167.5       200
      20         2         0       240        45        95     120.0       145
      21         2         0       198        16        91      99.0       107
      22         2         0       263        38       116     131.5       147
      23         2         0       304        37       143     152.0       161
      24         2         0       388        50       182     194.0       206
      25         2         0       239        17       106     119.5       133
      26         2         0       240        21       119     120.0       121
      27         2         0       259        24       127     129.5       132
      28         2         0       284        15       131     142.0       153
      29         2         0       440        63       193     220.0       247
      30         2         0       326        39       147     163.0       179
      31         2         0       279        29       121     139.5       158
      32         2         0       383        67       180     191.5       203
                                                                              
      31        60         2     8,720     1,631        81     145.3       247
                                                
                                     10,3511

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Linearized
		Mean std. err. [95% conf. interval]

weight		71.90064 .1654434 71.56321 72.23806


		Linearized
		Mean std. err. [95% conf. interval]

c.weight@sex
Male		78.62789 .2097761 78.20004 79.05573
Female		65.70701 .266384 65.16372 66.25031


		Linearized
		Mean std. err. [95% conf. interval]

c.weight@sex#race
Male#White		78.98862 .2125203 78.55518 79.42206
Male#Black		78.324 .8476215 76.59526 80.05273
Male#Other		68.16404 1.811668 64.46912 71.85896
Female#White		65.10844 .2926873 64.5115 65.70538
Female#Black		72.38252 1.059851 70.22094 74.5441
Female#Other		59.56941 1.325068 56.86692 62.27191


		Linearized
Over		Mean std. err. DEFF DEFT

c.
weight@
sex#race
Male#White		78.98862 .2125203 1.15287 1.07372
Male#Black		78.324 .8476215 1.34608 1.16021
Male#Other		68.16404 1.811668 2.08964 1.44556
Female #
White		65.10844 .2926873 2.09219 1.44644
Female #
Black		72.38252 1.059851 1.93387 1.39064
Female #
Other		59.56941 1.325068 1.55682 1.24772


		Linearized
Over		Mean std. err. Obs Size

c.
weight@
sex#race
Male#White		78.98862 .2125203 4,312 49,504,800
Male#Black		78.324 .8476215 500 5,096,044
Male#Other		68.16404 1.811668 103 1,558,636
Female #
White		65.10844 .2926873 4,753 53,494,749
Female #
Black		72.38252 1.059851 586 6,093,192
Female #
Other		59.56941 1.325068 97 1,410,092


		Linearized
highbp		Coefficient std. err. t P>\|t\| [95% conf. interval]

height		-.0345643 .0053121 -6.51 0.000 -.0453985 -.0237301
weight		.051004 .0025292 20.17 0.000 .0458457 .0561622
age		.0554544 .0127859 4.34 0.000 .0293774 .0815314

c.age#c.age		-.0000676 .0001385 -0.49 0.629 -.0003502 .0002149

female		-.4758698 .0561318 -8.48 0.000 -.5903513 -.3613882
black		.338201 .1075191 3.15 0.004 .1189143 .5574877
_cons		-.5140351 .8747001 -0.59 0.561 -2.297998 1.269928


		Linearized
highbp		Odds ratio std. err. t P>\|t\| [95% conf. interval

height		.9660262 .0051317 -6.51 0.000 .9556166 .9765492
weight		1.052327 .0026615 20.17 0.000 1.046913 1.057769
age		1.057021 .013515 4.34 0.000 1.029813 1.084947

c.age#c.age		.9999324 .0001385 -0.49 0.629 .9996499 1.000215

female		.6213444 .0348772 -8.48 0.000 .5541326 .6967085
black		1.402422 .1507872 3.15 0.004 1.126273 1.74628
_cons		.5980774 .5231384 -0.59 0.561 .1004598 3.560595


highbp		Odds ratio Std. err. t P>\|t\| [95% conf. interval]

(1)		.8713873 .1233177 -0.97 0.338 .6529215 1.162951

Survey features

<- See Stata's other features

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies