Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Setup for Survey Sampling--Example 9.4 from Scheaffer


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Setup for Survey Sampling--Example 9.4 from Scheaffer
Date   Wed, 26 Oct 2005 15:01:55 -0500

Susan Cochran <cochran@nicco.sscnet.ucla.edu> asks why -svy: mean- is not
reproducing a hand calculated value:

> I am trying to get Stata 9 to reproduce the analysis of Table 9.1 In 
> Scheaffer et al., 6th edition, p. 307.
> 
> There are 90 plants, 10 are sampled SRS without replacement at the first 
> stage, and within plants machines (m) are sampled SRS without replacement 
> and measured for the hours of being broken.  The known population size is 
> about 4500 machines.  M=number of machines in the plant sampled.
> 
> The calculations by hand reveal a mean of 4.8 hours.
> 
> I created a raw data file with the following structure
> 
> (removed for brevity)
> 
> When I specified the following set up
> 
> svyset plant [pweight=pwt], fpc(nplant) vce(linearized) || _n, fpc(M)
> 
> The total calculated correctly, as did the SE, but the mean is incorrect 
> (showing the simple mean of the dataset  4.6 not the mean of 4.8 which is 
> correct).  This is because (?) the population size is seen as 4698  (the 
> sum of the weights) not 4500 and the total hours/population size is then 
> 4.6.
> 
> What should the correct design setup in STATA be?

The crux of the issue is that Susan wants to get back

	21601.49 / 4500 = 4.8

however, -svy: mean- is computing

	21601.49 / 4697.97 = 4.598

Here 4697.97 is the sum total of the sampling weights used to produce
21601.49, which is the estimate for the population total.

The sum of the sampling weights estimates the population size.

These are two different methods for estimating the population mean, since the
estimate Susan wants assumes the population size is know.  -svy: mean- does
not implement this method.

*** Some background information:

The population mean estimator is a special case of the population ratio
estimator.  By definition, the population mean Ybar is

	Ybar = Y / N

where Y is the population total and N is the population size.  -svy: mean-
estimates Ybar using

	Ybarhat = Yhat / Nhat

where Yhat estimates the population total, and Nhat estimates the population
size.

If you know the value of N, you can simply compute Yhat using -svy: total- and
divide it by N.  The point here is that -svy: mean- does not compute

	Yhat / N

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index