[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Negative deff values in survey analysis |

Date |
Wed, 03 Jun 2009 23:15:59 -0500 |

Angel Rodriguez-Laso <angelrlaso@gmail.com> > I'm confused with the following results: > > > > . svyset psu [pweight=weight2007], strata(healtharea)fpc(psusperhealtharea) > > pweight: weight2007 > VCE: linearized > Strata 1: healtharea > SU 1: psu > FPC 1: psusperhealtharea > > . > end of do-file > > . svy: tab p29, deff deft > (running tabulate on estimation sample) > > Number of strata = 11 Number of obs = 12140 > Number of PSUs = 1266 Population size = 12134,139 > Design df = 1255 > > ------------------------------------------------- > Any permanent > disability | proportions deff deft > ----------+-------------------------------------- > 0, no | ,8887 -1981 ,9783 > 1, yes | ,1113 -1981 ,9783 > | > Total | 1 > ------------------------------------------------- > Key: proportions = cell proportions > deff = deff for variances of cell proportions > deft = deft for variances of cell proportions > > > > > Why do I get large negative deff values? Deft resembles more what I > was expecting, but it should be the square root of deff and obviously > this is not the case. Do you have any explanation for these results? Stas Kolenikov <skolenik@gmail.com> already pointed out that the sampling weights appear to be normalized by the sample size. In fact, the sum of the weights is less than the sample size. When the first stage is sampled without replacement (i.e. the 'fpc()' in the above -svyset-), the 'deff' calculation is deff = V_db / (1-n/W) V_srswr where 'V_db' is the design based variance estimate, 'V_srswr' is simple randome sample with replacement variance estimate, 'n' is the sample size, and 'W' is an estimate for the population size. Here 'W' is the sum of the sampling weights. Since Angel's sampling weights are normalized, they cannot be used to estimate the population size, thus the above 'deff' calculation is not valid. Without knowing what population size, we can't compute a valid 'deff' statistic. On the other hand, the 'deft' calculation is deft = sqrt( V_db / V_srswr ) which does not need an estimate of the population size, and thus will always produce a valid value. We will look into changing -svy: tabulate- and -estat effects- to report missing values for 'deff' in the case where the 'W' calculation is less than or equal to 'n'. --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Negative deff values in survey analysis***From:*Ángel Rodríguez Laso <angelrlaso@gmail.com>

- Prev by Date:
**st: Modeling repeated events with a continuous outcome** - Next by Date:
**Re: st: Negative deff values in survey analysis** - Previous by thread:
**Re: st: Negative deff values in survey analysis** - Next by thread:
**Re: st: Negative deff values in survey analysis** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |