Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Fwd: [MEPS-L] Using STATA to adjust for Complex Survey Design - Single PSU issu es--Additional Comments

From   Christopher F Baum <>
Subject   st: Fwd: [MEPS-L] Using STATA to adjust for Complex Survey Design - Single PSU issu es--Additional Comments
Date   Tue, 11 Oct 2005 14:54:18 -0400

An interchange on the MEPS (Medical Expenditure Panel Study) list---

Begin forwarded message:

From: "Rhoades, Jeffrey" <JRhoades@AHRQ.GOV>
Date: October 11, 2005 11:15:07 AM EDT
Subject: [MEPS-L] Using STATA to adjust for Complex Survey Design - Single PSU issu es--Additional Comments
Reply-To: AHRQ's question & discussion group regarding MEPS <MEPS- L@LIST.AHRQ.GOV>

1. Page 47 in STATA v9 "Survey Data" manual gives an explanation, including numerical examples from NHANES, of why ignoring strata and/or PSU is really really bad. Even though you can get correct point estimates, the tests, confidence intervals, and standard errors will usualy be wrong. In the NHANES II numerical example in the documentation, ignoring the strata but using weights and PSUs gives standard errors that are 50% too small.

2. Lonely PSUs are a problem in any software but STATA has fewer options than SUDAAN or R. Page 241 in STATA v9 "Survey Data" manual gives an explanation, including numerical example from NHANES, of dealing with single PSUs within strata in the STATA survey functions. The recommendation from the STATA documentation is to collapse strata. This is extremely dangerous when you don't know how the strata are formed.

The best advice is to avoid single PSUs if possible. The most common cause of single PSUs in MEPS is subsetting the data instead of using the subpop function (see page 38). Single PSUs do not naturally occur in the (full) Full Year Files but if you subset you can create a single PSU.

Note that if you are using a MEPS file such an event level file that only has records for individuals with the specific event you should consider linking back to the full year file to get the correct variance structure. You will need to define individuals not in the specific event file as having a zero event as opposed to dropping them. This will produce correct totals but means have a denominator that is the overall population. If you want means conditional on having the event, you will need to use the subpop function but the analysis in the non-event group is non-sensical.

If there are very many single PSUs it is a very serious problem. You might be able to form BRR replicates but you should seek the advice of a statistician in doing so.

If you have a single PSU due to linking to an external file such as NHIS or some other reason, you should seriously consider switching to SUDAAN (missunit) or R survey ( 'options("survey.lonely.psu")' ).

If you use any of STATA, SUDAAN, or R and decide to collapse strata, consider collapsing strata

a) within the same Census region - this will take a little data dredging to determine

b) of like sampling type, i.e., certainities vs non-certainites - this will definitely take some work

We would appreciate if any of your staff could advise on the issue below. Could you please also post it onto the listserv? Many thanks,


We are conducting analyses using NHIS and MEPS. We use the survey commands in STATA to adjust for complex survey design and have to decide how to deal with the problem with single psu. We understand that it is important to adjust for all three levels of survey design (weights, PSUs, and strata) in order to obtain correct variance and standard errors. However, we are wondering what the statistical implications are (magnitude and direction of standard errors, in particular) if we only adjust for weights and PSUs but not strata. Per the Stata manual, the variance estimates are based only on computations at the primary sampling-unit level and do not require information about the secondary sampling units. We thus had considered the loss of efficiency without adjusting for strata might have little impact on the variance estimates or standard errors.

We ran bivariates analyses with and without adjusting for strata. For some analyses, results of the Pearson chi-square test were similar. However, for other analyses, results were very different, e.g., analysis without adjusting for strata is not significant (p=0.16) while analysis adjusting for strata is highly significant (p<0.001).

We are wondering whether it has been examined or determined how the results may or may not vary without or without adjusting for strata. Does it depend on the type of analyses (bivaraite or regression), sample size, or sub-group analyses?

We would greatly appreciate your thoughts on this issue.


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index