[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Setup for Survey Sampling--Example 9.4 from Scheaffer |

Date |
Wed, 26 Oct 2005 15:01:55 -0500 |

Susan Cochran <cochran@nicco.sscnet.ucla.edu> asks why -svy: mean- is not reproducing a hand calculated value: > I am trying to get Stata 9 to reproduce the analysis of Table 9.1 In > Scheaffer et al., 6th edition, p. 307. > > There are 90 plants, 10 are sampled SRS without replacement at the first > stage, and within plants machines (m) are sampled SRS without replacement > and measured for the hours of being broken. The known population size is > about 4500 machines. M=number of machines in the plant sampled. > > The calculations by hand reveal a mean of 4.8 hours. > > I created a raw data file with the following structure > > (removed for brevity) > > When I specified the following set up > > svyset plant [pweight=pwt], fpc(nplant) vce(linearized) || _n, fpc(M) > > The total calculated correctly, as did the SE, but the mean is incorrect > (showing the simple mean of the dataset 4.6 not the mean of 4.8 which is > correct). This is because (?) the population size is seen as 4698 (the > sum of the weights) not 4500 and the total hours/population size is then > 4.6. > > What should the correct design setup in STATA be? The crux of the issue is that Susan wants to get back 21601.49 / 4500 = 4.8 however, -svy: mean- is computing 21601.49 / 4697.97 = 4.598 Here 4697.97 is the sum total of the sampling weights used to produce 21601.49, which is the estimate for the population total. The sum of the sampling weights estimates the population size. These are two different methods for estimating the population mean, since the estimate Susan wants assumes the population size is know. -svy: mean- does not implement this method. *** Some background information: The population mean estimator is a special case of the population ratio estimator. By definition, the population mean Ybar is Ybar = Y / N where Y is the population total and N is the population size. -svy: mean- estimates Ybar using Ybarhat = Yhat / Nhat where Yhat estimates the population total, and Nhat estimates the population size. If you know the value of N, you can simply compute Yhat using -svy: total- and divide it by N. The point here is that -svy: mean- does not compute Yhat / N --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: shade btwn xlines** - Next by Date:
**st: adjust after fracpoly** - Previous by thread:
**st: Setup for Survey Sampling--Example 9.4 from Scheaffer** - Next by thread:
**st: shade btwn xlines** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |