[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
sjsamuels@gmail.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Survey design degrees of freedom help |

Date |
Thu, 3 Sep 2009 21:05:53 -0500 |

-- For the model F test in -svy: logit- and -svy: reg-, Stata 10 computes the denominator degrees of freedom as: (design degrees of freedom) - (numerator degrees of freedom) +1. In other words, the sum of the numerator and denominator degrees of freedom = design d.f. + 1. I too am away from the office, so I don't know the reference for this computation. -Steve On Thu, Sep 3, 2009 at 2:47 PM, Stas Kolenikov<skolenik@gmail.com> wrote: > Jennifer, > > first of all, you don't need to subtract the constant term. > Stratification, in a sense, implies estimation of the fixed effect of > a stratum (although there's way more going on). > > You are right in thinking about degrees of freedom independent pieces > of information provided by each cluster/village. In the extreme case > when you sample the complete cluster, you only have one number out of > it (in terms of contributions to the variability of your estimates), > no matter how many units you have in the cluster. In less extreme > cases with large sample sizes within cluster, you have that number > (the cluster mean, say) plus some relatively small amount of variation > around it, so you still have 1 d.f. contributed by the cluster (or > 1+epsilon, if you like; although nobody really knows what this epsilon > might be). If you think about your sample (x1, ..., xn) as a vector in > n-dimensional space, the standard i.i.d. theory assumes that each > component of the vector can vary on its own, thus producing n degrees > of freedom for the sample, and n-1 degrees of freedom for variance > estimation (minus the overall mean). However in complex survey > sampling case, you have components corresponding to the same cluster > go together, at least to some extent, so your effective dimension is > much lower than n, and in the aforementioned extreme cases it is #PSUs > - #strata. > > The issue of degrees of freedom has been discussed by Korn & Graubard, > although I am not sure whether it was their book > (http://www.citeulike.org/user/ctacmo/article/553280) or a paper > (http://www.citeulike.org/user/ctacmo/article/933864). If you are > really short on degrees of freedom, you can cheat and go to the next > level, and use SSUs instead of PSUs as the baseline for degrees of > freedom (so d.f. = #SSUs - #strata). That's what you've done, too, > with your 90 SSUs and 86 "cheated" d.f.s. They've outlined some other > approaches, but that's probably the one easiest to understand. Still I > would frown upon that, and if I were to referee a paper that does > this, I would have the authors write a half-page explanation of what > they are doing, and recognize that this is basically a wrong thing to > do. > > Now, where would those degrees of freedom matter in estimation > procedures? First, that's the number of terms added up to form the > covariance matrix, so the rank of that matrix is bounded by d.f.s. You > might still be able to run a regression with more terms, but Stata > will refuse conducting tests with more than d.f. terms. That is the > main concern you are voicing. Second, the d.f.s are also used in the > Student distribution for testing purposes. Nobody has ever justified > the use of Student distribution in this context (in the end, it is a > model-based derivation assuming normality, whereas the survey > inference is supposed to be fully non-parametric without any > distributional assumptions), but it seems to be working better as an > approximation to the realistic distributions. > > Amazingly (and ashamingly), I cannot produce any references off the > top of my head that would deliver a clear explanation of those degrees > of freedom (I am not in my office where all the books are now). I hope > Korn & Graubard would give some references when they discuss the > issue... > > I've seen things going either way with those degrees of freedom in my > analytical work and simulations. Sometimes, when your cluster effects > are not terribly strong, you are OK with #SSU-#strata (and if > #PSU-#strata is over a hundred, who cares, anyway). Other times, I've > seen the effective degrees of freedom around 5 or 10 when the nominal > degrees of freedom (#PSU-#strata) was close to a hundred -- I had some > problematic strata with extreme skewness and kurtosis, so whatever I > happened to sample there was driving the remainder of the sample. > > On Thu, Sep 3, 2009 at 2:20 PM, Jennifer Schmitt<jorg0206@umn.edu> wrote: >> Thank you for your thoughts. The 20 villages are the only independent >> pieces of information, the rest are related. Is that the reason? It just >> seems so restrictive. I do have multi-stage sampling and my understanding >> of STATA is that it uses an "ultimate" cluster method, so unless my fpc are >> defined (which I don't define because they are all close to one), then STATA >> doesn't care about subsequent clusters because STATA incorporates all later >> stages of clustering in the main cluster. Therefore there is no change in >> my df. I have gone ahead and when necessary (because I need more df) I have >> defined my PSU as subvillage and get 90 (#subvillages) - 3 (#strata) - 1(for >> the constant = 86 df, but then I'm am ignoring the correlation of >> subvillages within a village. I feel confident that I really only have 16 >> df, it is just convincing others who do not know STATA or survey statistics >> that I have set up the statistical restrictions correctly and given the low >> df I have yet to convince others. I've told them that the villages are the >> only independent units, but that just does not seem sufficient. Any more >> thoughts by you or others is greatly appreciated, but regardless thanks for >> you thoughts thus far. > > > -- > Stas Kolenikov, also found at http://stas.kolenikov.name > Small print: I use this email account for mailing lists only. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA 845-246-0774 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: RE: Survey design degrees of freedom help***From:*<Andrew.Clapson@statcan.gc.ca>

**Re: st: RE: Survey design degrees of freedom help***From:*Jennifer Schmitt <jorg0206@umn.edu>

**Re: st: RE: Survey design degrees of freedom help***From:*Stas Kolenikov <skolenik@gmail.com>

- Prev by Date:
**[no subject]** - Next by Date:
**st: making sure on the graph** - Previous by thread:
**Re: st: RE: Survey design degrees of freedom help** - Next by thread:
**Re: st: RE: Survey design degrees of freedom help** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |