Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Survey design degrees of freedom help


From   Jennifer Schmitt <jorg0206@umn.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Survey design degrees of freedom help
Date   Thu, 03 Sep 2009 14:20:56 -0500

Thank you for your thoughts. The 20 villages are the only independent pieces of information, the rest are related. Is that the reason? It just seems so restrictive. I do have multi-stage sampling and my understanding of STATA is that it uses an "ultimate" cluster method, so unless my fpc are defined (which I don't define because they are all close to one), then STATA doesn't care about subsequent clusters because STATA incorporates all later stages of clustering in the main cluster. Therefore there is no change in my df. I have gone ahead and when necessary (because I need more df) I have defined my PSU as subvillage and get 90 (#subvillages) - 3 (#strata) - 1(for the constant = 86 df, but then I'm am ignoring the correlation of subvillages within a village. I feel confident that I really only have 16 df, it is just convincing others who do not know STATA or survey statistics that I have set up the statistical restrictions correctly and given the low df I have yet to convince others. I've told them that the villages are the only independent units, but that just does not seem sufficient. Any more thoughts by you or others is greatly appreciated, but regardless thanks for you thoughts thus far.
Cheers,
Jennifer

Andrew.Clapson@statcan.gc.ca wrote:
Jennifer,

Off the top of my head, it seems to me that, if degrees of freedom can
be considered roughly as the 'independent pieces of information' in the
model, I suppose that given the random choosing of your clusters
(villages), those 20 sub-groups are your only independent pieces of
information in your sample, and anything selected out of that
(sub-villages, then households) would be related, as they are in the
same cluster or PSU.
BUT, given the further sampling of households out of 'sub-villages' -
isn't this multi-stage sampling?  Stata can handle that as well, I
believe, although I haven't had cause to use it (yet).  I am not sure
how that would affect the degrees of freedom as Stata calculates it.

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jennifer
Schmitt
Sent: September 3, 2009 1:48 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: Survey design degrees of freedom help


My sample size is 489, but I only have 20 PSU, 3 Strata so my design df = 20-3-1 (for the constant). I have a stratified, clustered sample and so my design df are based off my PSU and are thus really low. I just do

not know know why this is the case (I have accepted that it is the case), but I can't defend that without knowing why (or maybe it is wrong, but everything in the FAQ and help sections online suggest it is correct). Thanks.

Andrew.Clapson@statcan.gc.ca wrote:
Your degrees of freedom are 16??
What is your sample size?

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jennifer Schmitt
Sent: September 3, 2009 11:26 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: Survey design degrees of freedom help


Hello everyone,
I need some important clarification about the design degrees of freedom for stratified clustered survey analysis. I have data that was stratified by three areas (NW, SW, and E). We randomly chose villages

from these three areas, then chose 5 subvillages within the village
and
within the subvillage we chose households (the unit of interest for my

analysis).  I am running logistic regressions with PSU = village,
strata
= area and probability weights.  My design degrees of freedom are 16
(PSU-strata-one for the constant term). I get that. What I do not understand is WHY and how to explain to others unfamiliar with STATA that it is correct, any answers to this would be greatly appreciated.

The reason this is an issue is that I want to test more than 16 variables at once and obviously I can't with only 16 df. Thank you.
Jennifer



--
Jennifer Schmitt
PhD Candidate - Conservation Biology Program
jorg0206@umn.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index