Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging datasets

From   Steve Samuels <>
Subject   Re: st: merging datasets
Date   Mon, 20 Feb 2012 09:05:43 -0500

Kyleigh Schraeder:

As a start, I assume that by "small sample", you mean a small survey sample, with its own survey structure. If not, see below.  The trick is to combine the two data sets in such a way so that you can -svyset- the combination and apply Stata's survey commands. This is the standard approach for combining two independent samples.

Suppose the -svyset- statement for large survey data set is:

svyset psu [pw = myweight], strata(stratumvar)   

In what follows, I use the prefix "small_" for variables from the small data set. If the small sample was not a survey sample, then for "small_psu" use "_n" and for "small_weight" use "1".  It is crucial to assign a stratum number for the small sample that is not present in the large sample.

use small_data, clear
gen psu = small_psu  
gen int stratumvar = small_stratum+10000  // a stratum number not in the large sample
gen myweight = small_weight
gen samptype = 1
append using large_data   
replace samptype = 2 if sample==.
label define samptype 1 "small" 2 "national"
label values samptype samptype

svyset psu [pw = myweight], stratum(stratumvar)
save combo, replace

Then do any survey command for comparing the two samples, e.g.

svy: reg myoutcome i.samptype

Hypothesis tests are often not appropriate for comparing descriptive statistics of two finite populations, because no specific groups of people can expected to have _identical_ means or other statistics.  For references, see: Confidence intervals provide a satisfying alternative.


On Feb 19, 2012, at 6:03 PM, Kyleigh Schraeder wrote:

I would like to compare means and percentages from a small sample
dataset to a large population national dataset.  The population
dataset has population weights.  I have stset the population dataset
but I need to merge the two datasets in order to run analyses.  I am
wondering how I can merge the two datasets but still apply the
population weights to the large dataset beforehand (and not apply the
population weights to the small sample).

Thanks for your help,
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index