Home  /  Resources & support  /  FAQs  /  Missing standard error because of stratum with single sampling unit

What should I do when one of the survey estimators returns an error message, "Missing standard error because of stratum with single sampling unit"?

Title   Missing standard error because of stratum with single sampling unit
Author Mia Lv, StataCorp

The meaning of this error message

By default, Stata's survey estimation commands report missing standard errors when they encounter a stratum with a singleton PSU. Here is an example:

. use http://www.stata-press.com/data/r15/nhanes2b, clear

. svyset psuid [pweight=finalwgt], strata(stratid)
(output omitted)

. svy: mean hdresult
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31             Number of obs   =      8,720
Number of PSUs   = 60             Population size = 98,725,345
                                  Design df       =         29

Linearized
Mean std. err. [95% conf. interval]
hdresult 49.67141 . . .
Note: Missing standard error because of stratum with single sampling unit.

When there is only one PSU within a stratum, there is insufficient information to compute an estimate of that stratum's variance. Therefore, it is impossible to compute the variance of an estimated parameter when the data are from a stratified clustered design. There are two different solutions. The first solution is to reassign each stratum with a singleton PSU to another appropriately chosen stratum. To use this method, we must identify the strata with singleton PSUs first.

How to identify the strata with singleton PSUs

After setting our survey characteristics with svyset, we can use the svydescribe command to identify the strata with singleton PSUs. Those strata will be marked with an asterisk in the output. Let's look at the following dataset:

clear

input stratid	psuid	 age	hdresult	finalwgt
1	1	68	40	9687
1	1	54	53	36028
2	1	26	35	26896
2	1	24	48	8213
2	2	68	43	3316
2	2	61	65	8475
3	1	25	80	10900
3	1	27	93	7619
3	2	24	38	22584
3	2	64	72	2875
end

svyset psuid [pweight=finalwgt], strata(stratid)

save data1.dta

We run svydescribe and get the following output:

. svydescribe

Survey: Describing stage 1 sampling units

Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: stratid
 Sampling unit 1: psuid
           FPC 1: <zero>

Number of obs per unit Stratum # units # obs Min Mean Max
1 1* 2 2 2.0 2 2 2 4 2 2.0 2 3 2 4 2 2.0 2
3 5 10 2 2.0 2

Here we can see that the stratum 1 has a singleton PSU.

We perform an estimation with survey data, the problem of stratum with a singleton PSU can arise, even if all strata in the dataset have multiple PSUs. This happens when some observations are dropped because of missing values.

Let us look at the following survey data. In this example, when we try to estimate the mean of variable hdresult, the standard errors are missing, and a note on the output tells us that this is caused by a stratum with a single PSU:


clear

input stratid	psuid	age	hdresult	finalwgt
1	1	68	40	9687
1	1	54	53	36028
1	2	28	.	9356
1	2	35	.	10265
2	1	26	35	26896
2	1	24	48	8213
2	2	68	43	3316
2	2	61	65	8475
3	1	25	80	10900
3	1	27	93	7619
3	2	24	38	22584
3	2	64	72	2875
end

svyset psuid [pweight=finalwgt], strata(stratid)
. svy: mean hdresult
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 3                 Number of obs   =      10
Number of PSUs   = 5                 Population size = 136,593
                                     Design df       =       2

Linearized
Mean std. err. [95% conf. interval]
hdresult 51.04046 . . .
Note: Missing standard error because of stratum with single sampling unit. . svydescribe Survey: Describing stage 1 sampling units Sampling weights: finalwgt VCE: linearized Single unit: missing Strata 1: stratid Sampling unit 1: psuid FPC 1: <zero>
Number of obs per unit Stratum # units # obs Min Mean Max
1 2 4 2 2.0 2 2 2 4 2 2.0 2 3 2 4 2 2.0 2
3 6 12 2 2.0 2

The command svydescribe does not detect any stratum with singleton PSUs because by default svydescribe checks the entire dataset. However, the appropriate way here is to use the if e(sample) expression to run svydescribe within the estimation sample used by svy: mean hdresult.

. svydescribe if e(sample)
Survey: Describing stage 1 sampling units

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: stratid
         SU 1: psuid
           FPC 1: <zero>

Number of obs per unit
Stratum # units # obs Min Mean Max
1 1* 2 2 2.0 2 2 2 4 2 2.0 2 3 2 4 2 2.0 2
3 5 10 2 2.0 2
 
2 = #Obs with missing values in the
survey characteristics
12

An alternative way to use svydescribe in this scenario is to write:

svydescribe hdresult

This line will apply svydescribe to the subset of the data where variable hdresult doesn't have missing values.

First solution: Reassign each stratum with a singleton PSU

After detecting the strata with singleton PSUs, we now reassign each stratum with a singleton PSU to another properly chosen stratum. Let us look at the dataset data1.dta, saved in the previous section. We already know that only the stratum 1 has a singleton PSU. Assuming that we want to reassign stratum 1 to stratum 2, we first generate a new PSU identifier variable psu and a new strata identifier variable strata. In this way, we won't lose any information in the original dataset. Then, we need to assign distinct values to psu for all the sampling units in strata 1 and 2 so that we can differentiate each sampling unit in the combined new stratum. After that, we can change the value of strata. We also need to svyset our data again using the new variables psu and strata.


use data1, clear

egen psu = group(stratid psuid) if inlist(stratid,1,2)

replace psu = psuid if stratid>2

generate strata=stratid

replace strata=2 if strata==1

svyset psu [pweight=finalwgt], strata(strata)

Now, let us check again if there are any strata with singleton PSUs:

. svydescribe

Survey: Describing stage 1 sampling units

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: strata
         SU 1: psu
           FPC 1: <zero>

Number of obs per unit Stratum # units # obs Min Mean Max
2 3 6 2 2.0 2 3 2 4 2 2.0 2
2 5 10 2 2.0 2

All the strata have multiple PSUs now. We can go ahead and run our svy estimation commands.

Second solution: specify the singleunit() option with svyset

An alternative solution to handle the strata with singleton PSUs is to specify the singleunit() option when we svyset the data. The default specification is singleunit(missing), which results in missing values for the standard errors. Other than that, there are three options. The first one, singleunit(certainty), will treat strata with singleton PSUs as certainty units, so those strata contribute nothing to the standard error. The second option, singleunit(scaled), is a scaled version of singleunit(certainty). The scaling factor comes from using the average of the variances from the strata with multiple sampling units for each stratum with a singleton PSU. The third option, singleunit(centered), specifies that strata with singleton PSUs be centered at the grand mean instead of the stratum mean.

Here is an example using singleunit(certainty):

. use http://www.stata-press.com/data/r15/nhanes2b, clear

. svyset psuid [pweight=finalwgt], singleunit(certainty) strata(stratid)


Sampling weights: finalwgt
             VCE: linearized
     Single unit: certainty
        Strata 1: stratid
 Sampling unit 1: psuid
           FPC 1: <zero>

. svy: mean hdresult
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31             Number of obs   =      8,720
Number of PSUs   = 60             Population size = 98,725,345
                                  Design df       =         29

Linearized
Mean std. err. [95% conf. interval]
hdresult 49.67141 .3829811 48.88813 50.4547
Note: Strata with single sampling unit treated as certainty units.

For more details about the methodology used by Stata when estimating the variance in survey designed data, please see the entry of [SVY] variance estimation. You can decide how to specify singleunit() based on your analysis assumption.