# Re: st: Negative deff values in survey analysis

 From [email protected] (Jeff Pitblado, StataCorp LP) To [email protected] Subject Re: st: Negative deff values in survey analysis Date Wed, 03 Jun 2009 23:15:59 -0500

```Angel Rodriguez-Laso <[email protected]>

> I'm confused with the following results:
>
>
>
> . svyset psu [pweight=weight2007], strata(healtharea)fpc(psusperhealtharea)
>
>       pweight: weight2007
>           VCE: linearized
>      Strata 1: healtharea
>          SU 1: psu
>         FPC 1: psusperhealtharea
>
> .
> end of do-file
>
> . svy: tab p29, deff deft
> (running tabulate on estimation sample)
>
> Number of strata   =        11                  Number of obs      =     12140
> Number of PSUs     =      1266                  Population size    = 12134,139
>                                                 Design df          =      1255
>
> -------------------------------------------------
> Any permanent
> disability | proportions         deff         deft
> ----------+--------------------------------------
>     0, no |       ,8887        -1981        ,9783
>     1, yes |       ,1113        -1981        ,9783
>           |
>     Total |           1
> -------------------------------------------------
>   Key:  proportions  =  cell proportions
>         deff         =  deff for variances of cell proportions
>         deft         =  deft for variances of cell proportions
>
>
>
>
> Why do I get large negative deff values? Deft resembles more what I
> was expecting, but it should be the square root of deff and obviously
> this is not the case. Do you have any explanation for these results?

Stas Kolenikov <[email protected]> already pointed out that the sampling
weights appear to be normalized by the sample size.  In fact, the sum of the
weights is less than the sample size.  When the first stage is sampled without
replacement (i.e. the 'fpc()' in the above -svyset-), the 'deff' calculation
is

deff = V_db / (1-n/W) V_srswr

where 'V_db' is the design based variance estimate, 'V_srswr' is simple
randome sample with replacement variance estimate, 'n' is the sample size, and
'W' is an estimate for the population size.  Here 'W' is the sum of the
sampling weights.  Since Angel's sampling weights are normalized, they cannot
be used to estimate the population size, thus the above 'deff' calculation is
not valid.  Without knowing what population size, we can't compute a valid
'deff' statistic.

On the other hand, the 'deft' calculation is

deft = sqrt( V_db / V_srswr )

which does not need an estimate of the population size, and thus will always
produce a valid value.

We will look into changing -svy: tabulate- and -estat effects- to report
missing values for 'deff' in the case where the 'W' calculation is less than
or equal to 'n'.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```