Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How to compare models after svy


From   Orian Brook <ob11@st-andrews.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: How to compare models after svy
Date   Wed, 22 May 2013 12:34:26 +0100

Thanks for your reply, Stas. Yes, I'm aware that what I usually use is a
likelihood ratio test, and I think I understand the explanation given by
Stata for why the usual logit postestimation commands don't work with
complex survey data. What I wasn't sure of was whether the F statistic could
in these circumstance be used to compare logit models, as the references I
found only referred to using it in relation to least squared regression.
However, and I'm sure it's my fault, I'm afraid that I found the links you
gave quite  technical and not too illuminating for how I can actually use
the Stata output.

All the reference that I find to using the F statistic to compare models
require me to compare the change in RSS for each model to the change in the
degrees of freedom - but Stata doesn't give me this, it gives me the Mean
Square Model divided by the Mean Square Residual.  I could use each model to
create predictions and manually calculate the RSS but I'm unsure of whether
this would work given it's a prediction from a logit. 

If it helps to actually give you my output, the relevant F statistics I'm
getting are 

F(  21,  16828)    =     65.97 and
F(  25,  16824)    =     56.93 

Thanks so much for any further clarification

Orian

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Stas Kolenikov
Sent: 21 May 2013 14:17
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: How to compare models after svy

What you are used to in logistic regression modeling, and what you refer as
deviance differences, are essentially likelihood ratio tests (compare the
likelihoods of two models, transform them to asymptotic chi-square). This
does not work for complex survey data (or at least not in a way that could
make it directly usable), so what you would want to do is to use Wald tests
available after any estimation command with -test- (as opposed to the
logistic-specific fit test commands or -lrtest-, whichever you were using).
With -svy-, these tests would produce an F-test, which is roughly chi-square
divided by its
(estimated) degrees of freedom. In turn, the estimated degrees of freedom,
as a first approximation, is the number of primary sampling units minus the
number of strata; and -svy- uses a more complicated approximation based on
the eigenvalues / generalized design effects of the parameter estimates
covariance matrix.

If you have never heard about Wald tests (or, for that matter, of likelihood
ratio tests), take a look at Buse (1982),
http://www.citeulike.org/user/ctacmo/article/890474. Note that it only
applies to the "standard" i.i.d. data. The generalized design effects are
introduced and discussed in (I believe) Rao and Scott (1981),
http://www.citeulike.org/user/ctacmo/article/1036968, reviewed over again in
Rao and Thomas (2003), http://www.citeulike.org/user/ctacmo/article/8922395.

-- Stas Kolenikov, PhD, PStat (SSC)
-- Senior Survey Statistician, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name



On Tue, May 21, 2013 at 7:52 AM, Orian Brook <ob11@st-andrews.ac.uk> wrote:
> Hi all
>
> I'm hoping for advice on comparing models after using -svy: logit-, as 
> the postestimation commands normally used after logit don't work.
>
> I had been working without using the - svy - prefix as I'm controlling 
> for most of the criteria that were used in sampling, and in early 
> iterations of the model the results were very similar. With my final 
> model version, the results given by using or not using -svy - are 
> still similar, but the independent variables I'm interested in are 
> slightly more significant and with slightly stronger effect sizes, so I'm
in favour of using the prefix!
> However, in my final version I'm using two interactions (both 
> categorical-categorical), where not all the interaction effects are 
> significant. With -svy- I can compare the change in deviance to the 
> change in degrees of freedom to confirm that the model with 
> interactions is a better fit. Again, using the -svy- prefix, not all 
> of the interaction effects are significant and I want to check whether 
> overall the model with interactions is a better fit, but I'm not given 
> the log likelihood or deviance statistic. I'm given an f-test but I 
> thought this was only appropriate for a least squares model?
>
> Thanks all
>
> Orian
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index