I'm using Stata 9.2. Many thanks for your time and interest. Angel Rodriguez-Laso 2009/6/4 Jeff Pitblado, StataCorp LP <jpitblado@stata.com>: > Angel Rodriguez-Laso <angelrlaso@gmail.com> > >> I'm confused with the following results: >> >> >> >> . svyset psu [pweight=weight2007], strata(healtharea)fpc(psusperhealtharea) >> >> pweight: weight2007 >> VCE: linearized >> Strata 1: healtharea >> SU 1: psu >> FPC 1: psusperhealtharea >> >> . >> end of do-file >> >> . svy: tab p29, deff deft >> (running tabulate on estimation sample) >> >> Number of strata = 11 Number of obs = 12140 >> Number of PSUs = 1266 Population size = 12134,139 >> Design df = 1255 >> >> ------------------------------------------------- >> Any permanent >> disability | proportions deff deft >> ----------+-------------------------------------- >> 0, no | ,8887 -1981 ,9783 >> 1, yes | ,1113 -1981 ,9783 >> | >> Total | 1 >> ------------------------------------------------- >> Key: proportions = cell proportions >> deff = deff for variances of cell proportions >> deft = deft for variances of cell proportions >> >> >> >> >> Why do I get large negative deff values? Deft resembles more what I >> was expecting, but it should be the square root of deff and obviously >> this is not the case. Do you have any explanation for these results? > > Stas Kolenikov <skolenik@gmail.com> already pointed out that the sampling > weights appear to be normalized by the sample size. In fact, the sum of the > weights is less than the sample size. When the first stage is sampled without > replacement (i.e. the 'fpc()' in the above -svyset-), the 'deff' calculation > is > > deff = V_db / (1-n/W) V_srswr > > where 'V_db' is the design based variance estimate, 'V_srswr' is simple > randome sample with replacement variance estimate, 'n' is the sample size, and > 'W' is an estimate for the population size. Here 'W' is the sum of the > sampling weights. Since Angel's sampling weights are normalized, they cannot > be used to estimate the population size, thus the above 'deff' calculation is > not valid. Without knowing what population size, we can't compute a valid > 'deff' statistic. > > On the other hand, the 'deft' calculation is > > deft = sqrt( V_db / V_srswr ) > > which does not need an estimate of the population size, and thus will always > produce a valid value. > > We will look into changing -svy: tabulate- and -estat effects- to report > missing values for 'deff' in the case where the 'W' calculation is less than > or equal to 'n'. > > --Jeff > jpitblado@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

