Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Complex survey design


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Complex survey design
Date   Thu, 19 Oct 2006 13:41:05 -0500

sergio salis <ssalis22@yahoo.it> has a question about estimating multiple
subpopulation means simultaneous:

> I have never used the complex survey design in stata
> and I would really appreciate if you could give me an
> advice. I have a firm-level dataset which includes 2
> subsets. The first subset includes data that I don't
> want to analyse, while the second is the one of
> interest.
> My aim is to draw statistics for each region in the
> subset 2.
> I suppose that I should proceed as follows:
> 
> Create one identifier for each reg in subset 2
>  
> gen reg1=1 if subsample==2 & region==1
> replace reg1=0 if reg1!=1
> (do the same for all the regions...)
> 
> Now I would calculate the mean as follows:
> 
> svy: mean productivity, sub(reg1)
> svy: mean productivity, sub(reg2)
> ....
> svy: mean productivity, sub(regN)
> 
> I would do so because I realised that the new commands
> (stata 9) for svy do not include the option 'by'
> previously available. Is the way of proceeding
> described above correct to obtain the regional means?
> (I guess that first dropping the observations for
> which subset==2 and then creating the reg dummies is
> completely wrong.)
> 
> I read on this webpage
> http://www.cpc.unc.edu/services/computer/presentations/statatutorial/example31.html
> 
> that using 'if' to analyse subsets of the dataset
> (instead of the subpop option) is wrong since for the
> variance, standard error, and confidence intervals to
> be calculated properly svy must use all the available
> observations. Hence the only way to proceed seems to
> be as I described above...please let me know whether I
> am following the right procedure.
> Thanks in advance for your help

In Stata 9, the new -mean- command has an -over()- option which is synonymous
with the -by()- option of the old -svymean- command.

Thus Sergio can use the -subpop()- option of -svy- to identify the
subpopulation of interest, and the -over()- option of -mean- to identify the
groups over which to simultaneously estimate subpopulation means and their
variance-covariance matrix.

Given Sergio's Stata code above, this can be accomplished by the following
single command:

	. svy, subpop(if subsample == 2) : mean productivity, over(region)

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index