Survey support for multilevel models

Order

Watch video demo

<- See Stata's other features

Highlights

Continuous, binary, ordinal, count, and survival-time outcomes
Point estimates and standard errors adjusted for survey design

Sampling weights for each stage of a multiple-stage design
Primary and secondary sampling units
Stratification
Finite population corrections

Fully integrated with Stata's svyset command and svy prefix

Multilevel models are fit to data that can be divided into groups. These may be patients treated at the same hospital, cars manufactured at the same plant, students attending the same school, and so on.

As a more concrete example, suppose an educational researcher has given a test to a sample of students in Texas and wants to analyze the results. The students can be grouped into schools, and the schools can be grouped into school districts. If we believe unobserved characteristics of the individual schools as well as characteristics of the school districts are likely to impact the test results, we can fit a multilevel model with school-level and district-level random effects.

What if we want to fit a multilevel model to data collected using a complex survey design rather than a simple random sample? We need to take into account characteristics of the survey design—clustering, stratification, sampling weights, and finite-population corrections—to obtain appropriate point estimates and standard errors. Adjusting for survey design in multilevel models is unique in that we need weights for each level of the model, assuming those levels correspond to stages of the sampling design.

Continuing with our testing example, we will suppose that the researcher first took a sample of school districts. Then, schools were sampled from within each selected school district. Finally, students were selected from within each selected school. We have a multiple-stage sampling design. We also have sampling weights for each stage of the design related to the probabilities of school districts, individual schools, and students being included in the sample.

Throughout Stata, analyzing complex survey data is as simple as using svyset to declare aspects of the survey design and then adding the svy: prefix to the estimation command for the model you want to fit. We can now use svyset and the svy: prefix when fitting multilevel models to survey data.

Let's see it work

To demonstrate, we use a dataset arising from a two-stage sampling design. Here, schools are selected at the first stage. Then, students are sampled from within the selected schools. Our data contain sampling weights for both schools and students. We can type

. svyset school_id, weight(wt_school) || _n, weight(wt_student)

to specify that school_id and _n (the observation number) identify schools and students, the first- and second-stage sampling units. The school-stage sampling weight, wt_school, records the inverse of the probability that the school was included in the sample. The wt_student variable records the inverse of the probability that the student was included, conditional on the student's school having already been selected.

We are interested in the effects of sex, socioeconomic status, and speaking English at home on reading. We fit a two-level logit model for pass_read, which is coded as one if a student passes a reading proficiency threshold and zero otherwise. We allow for school-level random intercepts. To fit this model, we type

. svy: melogit pass_read female sei home_eng || school_id:

Because we specified the svy: prefix, the results from melogit are automatically adjusted for our survey design.

. svy: melogit pass_read female sei home_eng || school_id:
(running melogit on estimation sample)


Survey: Mixed-effects logistic regression

Number of strata =   1                            Number of obs   =      2,069
Number of PSUs   = 148                            Population size = 346,373.74
                                                  Design df       =        147
                                                  F(3, 145)       =      21.03
                                                  Prob > F        =     0.0000




                           Linearized
   pass_read  Coefficient  std. err.      t    P>|t|     [95% conf. interval]

      female    .6008465   .1536047     3.91   0.000     .2972878    .9044052
         sei    .0311463   .0047519     6.55   0.000     .0217554    .0405373
    home_eng    1.005684   .3315877     3.03   0.003     .3503888    1.660978
       _cons   -3.517315   .4169515    -8.44   0.000    -4.341308   -2.693321

school_id    
   var(_cons)    .5348872   .2409983                      .2195645    1.303054

We find that being female, higher socioeconomic status, and speaking English at home are all associated with a higher probability of passing the reading proficiency threshold. We also find a moderate amount of variation across schools—the variance of the random effects for schools is .535.

We demonstrated how to analyze survey data with a multilevel logit model. Stata's commands for fitting multilevel probit, complementary log-log, ordered logit, ordered probit, Poisson, negative binomial, parametric survival, and generalized linear models also support complex survey data.

gsem can also fit multilevel models, and it extends the type of models that can be fit in many ways. For instance, gsem can fit multilevel multinomial logit models, multivariate multilevel models, and multilevel structural equation models. gsem also supports estimation with complex survey data.

Tell me more

You can read another worked example of multilevel analysis of survey data in the Stata manual entry for the multilevel mixed-effects generalized linear model; see [ME] meglm.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Linearized
pass_read		Coefficient std. err. t P>\|t\| [95% conf. interval]

female		.6008465 .1536047 3.91 0.000 .2972878 .9044052
sei		.0311463 .0047519 6.55 0.000 .0217554 .0405373
home_eng		1.005684 .3315877 3.03 0.003 .3503888 1.660978
_cons		-3.517315 .4169515 -8.44 0.000 -4.341308 -2.693321

school_id
var(_cons)		.5348872 .2409983 .2195645 1.303054