[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Combining multiple survey data sets

From   James Swartz <>
Subject   st: Combining multiple survey data sets
Date   Sun, 14 Feb 2010 16:02:43 -0600


I searched for information on this topic and found a bit in archived threads, but not as much detail as I need. So at the risk of some redundancy, I would like to ask for help from a sampling statistician out there who also knows Stata well and who has worked with multiple survey data sets:

I am using two data sets. One is the National Comorbidity Survey Replication (NCS-R) and the other is a data set based on data I collected locally using the same instrument as the NCS-R. The N's are very different: the NCS-R part 2 is around 5,000 to 6,000 cases and my data set has only about 450 cases. Each data set has different survey parameters. I have no PSUs but do have stratification on gender and I developed weights to account for non-coverage and non-response. The NCS-R data set includes variables for weights, strata, and PSUs. Here are my questions:

1) In a simple bivariate analysis, I want to compare the prevalences of chronic medical conditions in each data set. But how can I tell Stata to use one set of survey parameters for cases in the NCS-R and another for cases in my local data set? Also, how important is it to control for a finite population correction factor? I have not done this in any analyses previously.

2) In a second step, I used the PSMATCH2 add-on to create a matched sample of 450 cases from the NCS-R data set based on a selected set of demographics and other characteristics. I then want to fun logistic regressions on the odds of having a chronic medical conditions while controlling for the matching variables (the matches were not perfect) and other unmatched characteristics. I assume that at this point, the survey parameters are not applicable because there is no way (that I can figure) to apply the subpopulation option. Is that correct? Is this analytic model reasonable given the data sets available or would there be a better way to approach this problem?

Thanks for any help. I have been scratching my head on this one for awhile.



James Swartz, Ph.D., Associate Professor
Jane Addams College of Social Work, University of Illinois at Chicago
1040 W. Harrison Street (MC 309)
Chicago IL. 60607

P 312-996-8560
F 312-996-2770
C 312-961-3843

E (W):
E (H):

"That which stands in the way of our work, is our work."
        - Marcus Aurelius

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index