Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging datasets

From	Kyleigh Schraeder <[email protected]>
To	[email protected]
Subject	Re: st: merging datasets
Date	Mon, 20 Feb 2012 09:20:55 -0500

As you said, the 'small sample' was independently collected from the
national sample.  The small sample has its own survey structure and no
population weights.  Both samples have been matched for the same
variables.

Thank you for your help Steve!

KS
On Mon, Feb 20, 2012 at 9:05 AM, Steve Samuels <[email protected]> wrote:
> Kyleigh Schraeder:
>
> As a start, I assume that by "small sample", you mean a small survey sample, with its own survey structure. If not, see below.  The trick is to combine the two data sets in such a way so that you can -svyset- the combination and apply Stata's survey commands. This is the standard approach for combining two independent samples.
>
> Suppose the -svyset- statement for large survey data set is:
>
> ************************
> svyset psu [pw = myweight], strata(stratumvar)
> ************************
>
> In what follows, I use the prefix "small_" for variables from the small data set. If the small sample was not a survey sample, then for "small_psu" use "_n" and for "small_weight" use "1".  It is crucial to assign a stratum number for the small sample that is not present in the large sample.
>
> **************************************
> use small_data, clear
> gen psu = small_psu
> gen int stratumvar = small_stratum+10000  // a stratum number not in the large sample
> gen myweight = small_weight
> gen samptype = 1
> append using large_data
> replace samptype = 2 if sample==.
> label define samptype 1 "small" 2 "national"
> label values samptype samptype
>
> svyset psu [pw = myweight], stratum(stratumvar)
> save combo, replace
> **************************************
>
> Then do any survey command for comparing the two samples, e.g.
>
> ************************
> svy: reg myoutcome i.samptype
> *************************
>
> Hypothesis tests are often not appropriate for comparing descriptive statistics of two finite populations, because no specific groups of people can expected to have _identical_ means or other statistics.  For references, see: http://www.stata.com/statalist/archive/2011-09/msg01121.html. Confidence intervals provide a satisfying alternative.
>
>
> Steve
> [email protected]
>
>
> On Feb 19, 2012, at 6:03 PM, Kyleigh Schraeder wrote:
>
> I would like to compare means and percentages from a small sample
> dataset to a large population national dataset.  The population
> dataset has population weights.  I have stset the population dataset
> but I need to merge the two datasets in order to run analyses.  I am
> wondering how I can merge the two datasets but still apply the
> population weights to the large dataset beforehand (and not apply the
> population weights to the small sample).
>
> Thanks for your help,
> KS
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Kyleigh Schraeder, BSc (Hons)
M.Sc. Candidate
Clinical Psychology Program
Department of Psychology
University of Western Ontario
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: merging datasets
  - From: Kyleigh Schraeder <[email protected]>
- Re: st: merging datasets
  - From: Steve Samuels <[email protected]>

Prev by Date: Re: st: Converting annual data to quarterly data
Next by Date: st: linear approximation - impute "missing" years
Previous by thread: Re: st: merging datasets
Next by thread: st: Transform source table to edgelist (or nodelist)
Index(es):
- Date
- Thread