An interchange on the MEPS (Medical Expenditure Panel Study) list---
Begin forwarded message:
From: "Rhoades, Jeffrey" <[email protected]>
Date: October 11, 2005 11:15:07 AM EDT
To: [email protected]
Subject: [MEPS-L] Using STATA to adjust for Complex Survey Design -  
Single PSU issu es--Additional Comments
Reply-To: AHRQ's question & discussion group regarding MEPS <MEPS- 
[email protected]>
 1.  Page 47 in STATA v9 "Survey Data" manual gives an explanation,  
including numerical examples from NHANES, of why ignoring strata  
and/or PSU is really really bad.  Even though you can get correct  
point estimates, the tests, confidence intervals, and standard  
errors will usualy be wrong.  In the NHANES II numerical example in  
the documentation, ignoring the strata but using weights and PSUs  
gives standard errors that are 50% too small.
2. Lonely PSUs are a problem in any software but STATA has fewer  
options than SUDAAN or R.  Page 241 in STATA v9 "Survey Data"  
manual gives an explanation, including numerical example from  
NHANES, of dealing with single PSUs within strata in the STATA  
survey functions.  The recommendation from the STATA documentation  
is to collapse strata.  This is extremely dangerous when you don't  
know how the strata are formed.
The best advice is to avoid single PSUs if possible.  The most  
common cause of single PSUs in MEPS is subsetting the data instead  
of using the subpop function (see page 38).  Single PSUs do not  
naturally occur in the (full) Full Year Files but if you subset you  
can create a single PSU.
Note that if you are using a MEPS file such an event level file  
that only has records for individuals with the specific event you  
should consider linking back to the full year file to get the  
correct variance structure.  You will need to define individuals  
not in the specific event file as having a zero event as opposed to  
dropping them.  This will produce correct totals but means have a  
denominator that is the overall population.  If you want means  
conditional on having the event, you will need to use the subpop  
function but the analysis in the non-event group is non-sensical.
If there are very many single PSUs it is a very serious problem.   
You might be able to form BRR replicates but you should seek the  
advice of a statistician in doing so.
If you have a single PSU due to linking to an external file such as  
NHIS or some other reason, you should seriously consider switching  
to SUDAAN (missunit) or R survey ( 'options("survey.lonely.psu")' ).
If you use any of STATA, SUDAAN, or R and decide to collapse  
strata, consider collapsing strata
a) within the same Census region - this will take a little data  
dredging to determine
b) of like sampling type, i.e., certainities vs non-certainites -  
this will definitely take some work
We would appreciate if any of your staff could advise on the issue  
below.  Could you please also post it onto the listserv?  Many thanks,
Su-Ying
We are conducting analyses using NHIS and MEPS.  We use the survey  
commands in STATA to adjust for complex survey design and have to  
decide how to deal with the problem with single psu.  We understand  
that it is important to adjust for all three levels of survey  
design (weights, PSUs, and strata) in order to obtain correct  
variance and standard errors.  However, we are wondering what the  
statistical implications are (magnitude and direction of standard  
errors, in particular) if we only adjust for weights and PSUs but  
not strata.  Per the Stata manual, the variance estimates are based  
only on computations at the primary sampling-unit level and do not  
require information about the secondary sampling units.  We thus  
had considered the loss of efficiency without adjusting for strata  
might have little impact on the variance estimates or standard errors.
We ran bivariates analyses with and without adjusting for strata.   
For some analyses, results of the Pearson chi-square test were  
similar.  However, for other analyses, results were very different,  
e.g., analysis without adjusting for strata is not significant  
(p=0.16) while analysis adjusting for strata is highly significant  
(p<0.001).
We are wondering whether it has been examined or determined how the  
results may or may not vary without or without adjusting for  
strata.  Does it depend on the type of analyses (bivaraite or  
regression), sample size, or sub-group analyses?
We would greatly appreciate your thoughts on this issue.
Su-Ying
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/