sergio salis <ssalis22@yahoo.it> has a question about estimating multiple
subpopulation means simultaneous:
> I have never used the complex survey design in stata
> and I would really appreciate if you could give me an
> advice. I have a firm-level dataset which includes 2
> subsets. The first subset includes data that I don't
> want to analyse, while the second is the one of
> interest.
> My aim is to draw statistics for each region in the
> subset 2.
> I suppose that I should proceed as follows:
>
> Create one identifier for each reg in subset 2
>
> gen reg1=1 if subsample==2 & region==1
> replace reg1=0 if reg1!=1
> (do the same for all the regions...)
>
> Now I would calculate the mean as follows:
>
> svy: mean productivity, sub(reg1)
> svy: mean productivity, sub(reg2)
> ....
> svy: mean productivity, sub(regN)
>
> I would do so because I realised that the new commands
> (stata 9) for svy do not include the option 'by'
> previously available. Is the way of proceeding
> described above correct to obtain the regional means?
> (I guess that first dropping the observations for
> which subset==2 and then creating the reg dummies is
> completely wrong.)
>
> I read on this webpage
> http://www.cpc.unc.edu/services/computer/presentations/statatutorial/example31.html
>
> that using 'if' to analyse subsets of the dataset
> (instead of the subpop option) is wrong since for the
> variance, standard error, and confidence intervals to
> be calculated properly svy must use all the available
> observations. Hence the only way to proceed seems to
> be as I described above...please let me know whether I
> am following the right procedure.
> Thanks in advance for your help
In Stata 9, the new -mean- command has an -over()- option which is synonymous
with the -by()- option of the old -svymean- command.
Thus Sergio can use the -subpop()- option of -svy- to identify the
subpopulation of interest, and the -over()- option of -mean- to identify the
groups over which to simultaneously estimate subpopulation means and their
variance-covariance matrix.
Given Sergio's Stata code above, this can be accomplished by the following
single command:
. svy, subpop(if subsample == 2) : mean productivity, over(region)
--Jeff
jpitblado@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/