[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

svyset psu [pweight=pesodef2007], strata(areasalud)fpc(secperarea) pweight: pesodef2007 VCE: linearized Strata 1: areasalud SU 1: psu FPC 1: secperarea . svy:prop p45 (running proportion on estimation sample) Survey: Proportion estimation Number of strata = 11 Number of obs = 12174 Number of PSUs = 1266 Population size = 12172,5 Design df = 1255 -------------------------------------------------------------- | Linearized Binomial Wald | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ p45 | sí | ,0994565 ,0023199 ,0949052 ,1040077 no | ,9005435 ,0023199 ,8959923 ,9050948 -------------------------------------------------------------- . estat effects ---------------------------------------------------------- | Linearized | Proportion Std. Err. Deff Deft -------------+-------------------------------------------- p45 | sí | ,0994565 ,0023199 -5863 ,855246 no | ,9005435 ,0023199 -5863 ,855246 ---------------------------------------------------------- Note: Weights must represent population totals for deff to be correct when using an FPC; however, deft is invariant to the scale of weights. end of do-file So the standard error is calculated on the effective sample size (16648; p(1-p)/se*se) that, if corrected by deft*deft becomes (16648*0.855246*0.855246) 12177, much closer to the number of observations than to the number of clusters. That´s the reason why I comment that for precision, the sample size is a very important determinant. In fact, there is no disagreement between both points of views because the total sample size is determined by the number of clusters and the number of observations per cluster. What is surprising for me is that in regression in this context, only the number of clusters count and not the number of individuals per cluster (or the total number of individuals), as it's been said by Austin. That amounts to saying that having 1000 observations per cluster would yield the same precision than having 1. Cheers, Ángel 2008/7/8, Steven Samuels <sjhsamuels@earthlink.net>: > Angel, the primary determinant of precision is the number of clusters, and > degrees of freedom are based on these. > > To compute the sample size needed in a cluster sample, you need to estimate > the number of clusters needed *and* the number of observations per cluster. > Consider an extreme case: everybody in a cluster has the same value of an > outcome "Y", but the means differ between clusters. Here one observation > will completely represent the cluster and only the number of clusters > matters. At the other extreme, if each cluster is a miniature of the > original population and cluster are very similar, then relatively few > clusters are needed and more observations can be taken per cluster. > > > In practice, the actual choice of clusters/observations per cluster is made > on the basis of the budget, on the relative costs of adding a cluster and of > adding an additional observation within a cluster, and the ratios the SD's > for the main outcomes between and within clusters. As there are usually > several outcomes, a compromise sample size is chosen. See: Sharon Lohr, > Sampling: Design and Analysis, Duxbury, 1999, Chapter 5; WG Cochran, > Sampling Techniques, Wiley, 1977; L Kish, Survey Sampling, Wiley, 1965. > There are many internet references. > > > Key concepts: the intra-class correlation, which measures how similar > observations in the same clusters are compared to observations in different > clusters; the "design effect", which shows how the standard error of a > complex cluster sample is inflated compared to a simple random sample of the > same number of observations. Joanne Garret's program -sampclus-, (findit > sampclus), requires the investigator to input the correlation. It is most > easily calculated by a variance components analysis of similar data. > > A *theoretical* nested model can make some concepts clearer (Lohr). Suppose > there are observations Y_ij = c + a_i + e_ij. There are m random effects a_i > from a distribution with between-cluster SD s_b and, for each a_i, there are > n e_ij's drawn from a distribution with "within-cluster" SD s_w. The a's and > e's are independent. The total sample size is nm, and the variance of the > sample mean is: > > var = [(s_b)^2]/m + [(s_w)^2]/nm. You can see that, holding m fixed, > increasing the number of observations per cluster decreases only the 2nd > term. > > The actual formulas for sampling from finite populations are more > complicated, but the same principles apply. > > -Steve > > > On Jul 8, 2008, at 5:07 AM, Ángel Rodríguez Laso wrote: > > > Following the discussion, I don´t understand very well how degrees of > > freedom (number of clusters-number of strata) and the actual number of > > observations are used in svy commands (which are related to cluster > > regression). I say so because when I calculate the sample size needed > > in a survey to get a proportion with a determined confidence level, > > the number I get is the number of observations and not the number of > > degrees of freedom. So I assume that the number of observations is > > what conditions the standard error and then I don´t know what degrees > > of freedom are used for. > > > > Cheers, > > > > Ángel Rodríguez > > > > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: cluster and F test** - Next by Date:
**st: RE: RE: xtoverid error: internal reestimation of eqn differs from original** - Previous by thread:
**st: Simple way to calculate modality of non-normal variables** - Next by thread:
**st: Re: Capture syntax** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |