[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: svy:tab why standard error and confidence intervals for count are different than for |

Date |
Thu, 04 Feb 2010 14:01:02 -0600 |

Pramod Adhikari <pramod.adhikari@aihw.gov.au> asks why the standard errors of weighted counts and weight percentages do not follow the same relationship as the corresponding point estimates: > I am using svy:tab to generate estimate for a variable; one in terms of > percentages and another in terms of weighted numbers. The results in the > first table show that 95.3% said yes to H1, with a standard error of > 0.51896%. Given the size of the population (weighted population size of > 9577.258); I should be able to estimated the weighted population with > the estimated percentage. In terms of weighted counts, 95.3196% of > 9577.258=9129.0 said yes to H1. This weighted number is available in the > second table. > Since the standard error of the estimated prevalence is 0.51896%, I > would have thought that I can convert this percent to count. In terms of > weighted count it should be 0.51896%*9577.258=49.70. However, the > results in the second table show that the standard error of the weighted > count is whooping 390.4 compared to 49.7. > Are the variance estimation methods different for counts and > percentages? I would appreciate any pointer to the literature or any > explanation to this anomaly. > Thanks in advance. > > (Stata output omitted) The estimated percentages are really -mean- estimators and Pramod's "weighted numbers" are really -total- estimators. -svy: tabulate- uses -svy: mean- and -svy: total- to perform most of it's point and variance estimation. So this boils down to the following discussion. For simple random sampling (SRS), we have the following relationship between the mean and total estimators: mean = total/N where it is understood that 'N' is the sample size. Furthermore, this relationship is supported in their standard errors: SE(mean) = SE(total)/N This happens because 'N' is fixed before sampling occurs, so Var(mean) = Var(total/N) = Var(total)/N^2 So why doesn't this relationship hold for complex survey data? The answer to the question is in -[SVY] variance estimation-. The mean estimator is mean = total/W where 'W' is the sum of the sampling weights, which is rarely a known or fixed quantity (prior to sampling). Thus 'W' itself is a total estimator and 'mean' is the ratio of two total estimators. The variance of 'mean' is then Var(mean) = { Var(total) - 2*mean*Cov(total,W) + mean^2*Var(W) } / W^2 which is at the bottom of page 160 in [SVY] Stata Survey Data Reference Manual Release 11. Note that when the sampling weights are all constant, then Var(W) = 0 Cov(total,W) = 0 and we are back to the SRS relationship between 'mean' and 'total'. --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Why do logit model coefficients produce signs opposite to those obtained from OLS?** - Next by Date:
**st: function evaluator program** - Previous by thread:
**st: Could you please post the following enquiry into the statalist, thank you** - Next by thread:
**st: function evaluator program** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |