Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Survey design degrees of freedom help


From   sjsamuels@gmail.com
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Survey design degrees of freedom help
Date   Thu, 3 Sep 2009 21:05:53 -0500

--

For the model F test in -svy: logit- and -svy: reg-, Stata 10 computes
the denominator degrees of freedom as:   (design degrees of freedom) -
(numerator degrees of freedom) +1.   In other words, the sum of the
numerator and denominator degrees of freedom = design d.f. + 1.  I too
am away from the office, so I don't know the reference for this
computation.

-Steve

On Thu, Sep 3, 2009 at 2:47 PM, Stas Kolenikov<skolenik@gmail.com> wrote:
> Jennifer,
>
> first of all, you don't need to subtract the constant term.
> Stratification, in a sense, implies estimation of the fixed effect of
> a stratum (although there's way more going on).
>
> You are right in thinking about degrees of freedom independent pieces
> of information provided by each cluster/village. In the extreme case
> when you sample the complete cluster, you only have one number out of
> it (in terms of contributions to the variability of your estimates),
> no matter how many units you have in the cluster. In less extreme
> cases with large sample sizes within cluster, you have that number
> (the cluster mean, say) plus some relatively small amount of variation
> around it, so you still have 1 d.f. contributed by the cluster (or
> 1+epsilon, if you like; although nobody really knows what this epsilon
> might be). If you think about your sample (x1, ..., xn) as a vector in
> n-dimensional space, the standard i.i.d. theory assumes that each
> component of the vector can vary on its own, thus producing n degrees
> of freedom for the sample, and n-1 degrees of freedom for variance
> estimation (minus the overall mean). However in complex survey
> sampling case, you have components corresponding to the same cluster
> go together, at least to some extent, so your effective dimension is
> much lower than n, and in the aforementioned extreme cases it is #PSUs
> - #strata.
>
> The issue of degrees of freedom has been discussed by Korn & Graubard,
> although I am not sure whether it was their book
> (http://www.citeulike.org/user/ctacmo/article/553280) or a paper
> (http://www.citeulike.org/user/ctacmo/article/933864). If you are
> really short on degrees of freedom, you can cheat and go to the next
> level, and use SSUs instead of PSUs as the baseline for degrees of
> freedom (so d.f. = #SSUs - #strata). That's what you've done, too,
> with your 90 SSUs and 86 "cheated" d.f.s. They've outlined some other
> approaches, but that's probably the one easiest to understand. Still I
> would frown upon that, and if I were to referee a paper that does
> this, I would have the authors write a half-page explanation of what
> they are doing, and recognize that this is basically a wrong thing to
> do.
>
> Now, where would those degrees of freedom matter in estimation
> procedures? First, that's the number of terms added up to form the
> covariance matrix, so the rank of that matrix is bounded by d.f.s. You
> might still be able to run a regression with more terms, but Stata
> will refuse conducting tests with more than d.f. terms. That is the
> main concern you are voicing. Second, the d.f.s are also used in the
> Student distribution for testing purposes. Nobody has ever justified
> the use of Student distribution in this context (in the end, it is a
> model-based derivation assuming normality, whereas the survey
> inference is supposed to be fully non-parametric without any
> distributional assumptions), but it seems to be working better as an
> approximation to the realistic distributions.
>
> Amazingly (and ashamingly), I cannot produce any references off the
> top of my head that would deliver a clear explanation of those degrees
> of freedom (I am not in my office where all the books are now). I hope
> Korn & Graubard would give some references when they discuss the
> issue...
>
> I've seen things going either way with those degrees of freedom in my
> analytical work and simulations. Sometimes, when your cluster effects
> are not terribly strong, you are OK with #SSU-#strata (and if
> #PSU-#strata is over a hundred, who cares, anyway). Other times, I've
> seen the effective degrees of freedom around 5 or 10 when the nominal
> degrees of freedom (#PSU-#strata) was close to a hundred -- I had some
> problematic strata with extreme skewness and kurtosis, so whatever I
> happened to sample there was driving the remainder of the sample.
>
> On Thu, Sep 3, 2009 at 2:20 PM, Jennifer Schmitt<jorg0206@umn.edu> wrote:
>> Thank you for your thoughts.  The 20 villages are the only independent
>> pieces of information, the rest are related.  Is that the reason?  It just
>> seems so restrictive.  I do have multi-stage sampling and my understanding
>> of STATA is that it uses an "ultimate" cluster method, so unless my fpc are
>> defined (which I don't define because they are all close to one), then STATA
>> doesn't care about subsequent clusters because STATA incorporates all later
>> stages of clustering in the main cluster.  Therefore there is no change in
>> my df.  I have gone ahead and when necessary (because I need more df) I have
>> defined my PSU as subvillage and get 90 (#subvillages) - 3 (#strata) - 1(for
>> the constant = 86 df, but then I'm am ignoring the correlation of
>> subvillages within a village.  I feel confident that I really only have 16
>> df, it is just convincing others who do not know STATA or survey statistics
>> that I have set up the statistical restrictions correctly and given the low
>> df I have yet to convince others.  I've told them that the villages are the
>> only independent units, but that just does not seem sufficient.  Any more
>> thoughts by you or others is greatly appreciated, but regardless thanks for
>> you thoughts thus far.
>
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index