Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

Re: st: merging datasets

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: merging datasets Date Mon, 20 Feb 2012 09:05:43 -0500

```Kyleigh Schraeder:

As a start, I assume that by "small sample", you mean a small survey sample, with its own survey structure. If not, see below.  The trick is to combine the two data sets in such a way so that you can -svyset- the combination and apply Stata's survey commands. This is the standard approach for combining two independent samples.

Suppose the -svyset- statement for large survey data set is:

************************
svyset psu [pw = myweight], strata(stratumvar)
************************

In what follows, I use the prefix "small_" for variables from the small data set. If the small sample was not a survey sample, then for "small_psu" use "_n" and for "small_weight" use "1".  It is crucial to assign a stratum number for the small sample that is not present in the large sample.

**************************************
use small_data, clear
gen psu = small_psu
gen int stratumvar = small_stratum+10000  // a stratum number not in the large sample
gen myweight = small_weight
gen samptype = 1
append using large_data
replace samptype = 2 if sample==.
label define samptype 1 "small" 2 "national"
label values samptype samptype

svyset psu [pw = myweight], stratum(stratumvar)
save combo, replace
**************************************

Then do any survey command for comparing the two samples, e.g.

************************
svy: reg myoutcome i.samptype
*************************

Hypothesis tests are often not appropriate for comparing descriptive statistics of two finite populations, because no specific groups of people can expected to have _identical_ means or other statistics.  For references, see: http://www.stata.com/statalist/archive/2011-09/msg01121.html. Confidence intervals provide a satisfying alternative.

Steve
sjsamuels@gmail.com

On Feb 19, 2012, at 6:03 PM, Kyleigh Schraeder wrote:

I would like to compare means and percentages from a small sample
dataset to a large population national dataset.  The population
dataset has population weights.  I have stset the population dataset
but I need to merge the two datasets in order to run analyses.  I am
wondering how I can merge the two datasets but still apply the
population weights to the large dataset beforehand (and not apply the
population weights to the small sample).

KS
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```