Home  /  Products  /  Features  /  Multilevel models with survey data

Stata’s mixed for fitting linear multilevel models supports survey data. Sampling weights and robust/cluster standard errors are available.

Sampling weights are handled differently by mixed:

  1. Weights can (and should be) specified at every model level unless you wish to assume equiprobability sampling at that level.

  2. Weights at lower model levels need to indicate selection conditional on selection of the higher-level cluster and not merely indicate overall selection.

  3. The scaling of weights at lower levels needs to be considered. Unlike a standard analysis where the scale of the sampling weights is not an issue (only their relative sizes matter), in multilevel models weight scales need to be made "consistent" across lower-level clusters.

See [ME] mixed and, in particular, the Survey data section in that entry for all the technical details.

We demonstrate using mixed to fit a two-level model for data from a two-stage sampling design with sampling weights at both stages. Schools were sampled at the first stage, students at the second.

. webuse pisa2000
(Programme for International Student Assessment (PISA) 2000 data)

. mixed isei female high_school college one_for both_for test_lang 
     [pw=w_fstuwt] || id_school:, pweight(wnrschbw) pwscale(size) nolog

Mixed-effects regression                        Number of obs     =      2,069
Group variable: id_school                       Number of groups  =        148
                                                Obs per group:
                                                              min =          1
                                                              avg =       14.0
                                                              max =         28
                                                Wald chi2(6)      =     187.23
Log pseudolikelihood = -1443093.9               Prob > chi2       =     0.0000

                            (Std. err. adjusted for 148 clusters in id_school)

  Robust
isei Coefficient std. err. z P>|z| [95% conf. interval]
female .59379 .8732886 0.68 0.497 -1.117824 2.305404
high_school 6.410618 1.500337 4.27 0.000 3.470011 9.351224
college 19.39494 2.121145 9.14 0.000 15.23757 23.55231
one_for -.9584613 1.789947 -0.54 0.592 -4.466692 2.54977
both_for -.2021101 2.32633 -0.09 0.931 -4.761633 4.357413
test_lang 2.519539 2.393165 1.05 0.292 -2.170978 7.210056
_cons 28.10788 2.435712 11.54 0.000 23.33397 32.88179
  Robust
Random-effects parameters Estimate std. err. [95% conf. interval]
id_school: Identity
var(_cons) 34.69374 8.574865 21.37318 56.31617
var(Residual) 218.7382 11.22111 197.8147 241.8748

In the above, we specified the student-level weights using standard Stata weight syntax [pw=w_fstuwt] and the school-level weights with the pweight(wnrschbw) option as part of the school random-effects equation. We also specified pwscale(size) to rescale the student-level weights using one of three available methods.

Sampling weights imply robust standard errors, and in the case of mixed, standard errors are clustered at the highest level (schools in this example) unless you specify otherwise.

Tell me more

See [ME] mixed for more details.

Watch Multilevel models for survey data in Stata.