[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jennifer Schmitt <jorg0206@umn.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Survey design degrees of freedom help |

Date |
Fri, 04 Sep 2009 08:09:37 -0500 |

Stas,

Cheers, Jennifer Stas Kolenikov wrote:

Jennifer, first of all, you don't need to subtract the constant term. Stratification, in a sense, implies estimation of the fixed effect of a stratum (although there's way more going on). You are right in thinking about degrees of freedom independent pieces of information provided by each cluster/village. In the extreme case when you sample the complete cluster, you only have one number out of it (in terms of contributions to the variability of your estimates), no matter how many units you have in the cluster. In less extreme cases with large sample sizes within cluster, you have that number (the cluster mean, say) plus some relatively small amount of variation around it, so you still have 1 d.f. contributed by the cluster (or 1+epsilon, if you like; although nobody really knows what this epsilon might be). If you think about your sample (x1, ..., xn) as a vector in n-dimensional space, the standard i.i.d. theory assumes that each component of the vector can vary on its own, thus producing n degrees of freedom for the sample, and n-1 degrees of freedom for variance estimation (minus the overall mean). However in complex survey sampling case, you have components corresponding to the same cluster go together, at least to some extent, so your effective dimension is much lower than n, and in the aforementioned extreme cases it is #PSUs - #strata. The issue of degrees of freedom has been discussed by Korn & Graubard, although I am not sure whether it was their book (http://www.citeulike.org/user/ctacmo/article/553280) or a paper (http://www.citeulike.org/user/ctacmo/article/933864). If you are really short on degrees of freedom, you can cheat and go to the next level, and use SSUs instead of PSUs as the baseline for degrees of freedom (so d.f. = #SSUs - #strata). That's what you've done, too, with your 90 SSUs and 86 "cheated" d.f.s. They've outlined some other approaches, but that's probably the one easiest to understand. Still I would frown upon that, and if I were to referee a paper that does this, I would have the authors write a half-page explanation of what they are doing, and recognize that this is basically a wrong thing to do. Now, where would those degrees of freedom matter in estimation procedures? First, that's the number of terms added up to form the covariance matrix, so the rank of that matrix is bounded by d.f.s. You might still be able to run a regression with more terms, but Stata will refuse conducting tests with more than d.f. terms. That is the main concern you are voicing. Second, the d.f.s are also used in the Student distribution for testing purposes. Nobody has ever justified the use of Student distribution in this context (in the end, it is a model-based derivation assuming normality, whereas the survey inference is supposed to be fully non-parametric without any distributional assumptions), but it seems to be working better as an approximation to the realistic distributions. Amazingly (and ashamingly), I cannot produce any references off the top of my head that would deliver a clear explanation of those degrees of freedom (I am not in my office where all the books are now). I hope Korn & Graubard would give some references when they discuss the issue... I've seen things going either way with those degrees of freedom in my analytical work and simulations. Sometimes, when your cluster effects are not terribly strong, you are OK with #SSU-#strata (and if #PSU-#strata is over a hundred, who cares, anyway). Other times, I've seen the effective degrees of freedom around 5 or 10 when the nominal degrees of freedom (#PSU-#strata) was close to a hundred -- I had some problematic strata with extreme skewness and kurtosis, so whatever I happened to sample there was driving the remainder of the sample. On Thu, Sep 3, 2009 at 2:20 PM, Jennifer Schmitt<jorg0206@umn.edu> wrote:Thank you for your thoughts. The 20 villages are the only independent pieces of information, the rest are related. Is that the reason? It just seems so restrictive. I do have multi-stage sampling and my understanding of STATA is that it uses an "ultimate" cluster method, so unless my fpc are defined (which I don't define because they are all close to one), then STATA doesn't care about subsequent clusters because STATA incorporates all later stages of clustering in the main cluster. Therefore there is no change in my df. I have gone ahead and when necessary (because I need more df) I have defined my PSU as subvillage and get 90 (#subvillages) - 3 (#strata) - 1(for the constant = 86 df, but then I'm am ignoring the correlation of subvillages within a village. I feel confident that I really only have 16 df, it is just convincing others who do not know STATA or survey statistics that I have set up the statistical restrictions correctly and given the low df I have yet to convince others. I've told them that the villages are the only independent units, but that just does not seem sufficient. Any more thoughts by you or others is greatly appreciated, but regardless thanks for you thoughts thus far.

-- Jennifer Schmitt PhD Candidate - Conservation Biology Program University of Minnesota 100 Ecology Building 1987 Upper Buford Circle St. Paul, MN 55108 jorg0206@umn.edu * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: Survey design degrees of freedom help***From:*Stas Kolenikov <skolenik@gmail.com>

**References**:**RE: st: RE: Survey design degrees of freedom help***From:*<Andrew.Clapson@statcan.gc.ca>

**Re: st: RE: Survey design degrees of freedom help***From:*Jennifer Schmitt <jorg0206@umn.edu>

**Re: st: RE: Survey design degrees of freedom help***From:*Stas Kolenikov <skolenik@gmail.com>

- Prev by Date:
**st: RE: R: Estimating a model where the dependent variale is a ratio** - Next by Date:
**RE: Re: st: making sure on the graph** - Previous by thread:
**Re: st: RE: Survey design degrees of freedom help** - Next by thread:
**Re: st: RE: Survey design degrees of freedom help** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |