Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing Chi2/L2 in different samples using bootstrap


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Comparing Chi2/L2 in different samples using bootstrap
Date   Mon, 6 Dec 2010 08:20:24 -0600

There are multiple issues with this approach.

1. Only the Wald tests are applicable to survey data. The concept of
the likelihood is quite tedious for samples from finite population.

2. According to your syntax, your design also has clustering, which
need to be accounted for using -svyset- and subsequent -svy :
whatever- estimation procedures.

3. The bootstrap for complex survey data is considerably more
complicated, and -bootstrap- does not deliver the necessary
flexibility. For one thing, each bootstrap sample will have a
different total target population, as your sum of weights is not
maintained. For another, your sampling unit eligible for resampling is
PSU as a whole. If your number of PSUs per stratum is small (and it is
typical to have a handful PSUs/stratum, if not just 2 PSUs/stratum),
you would need to work with -svy bootstrap- (and probably -bsweights-
that I wrote). If you can ignore stratification, and PSUs have roughly
equal probabilities of selection, you may still be able to use
-bootstrap-, but you would need to add -cluster() idcluster()- options
to your syntax.

I still don't quite understand what is the problem with merging the
data over time is though. I would view this as the simplest practical
solution and pursue this one first.

On Mon, Dec 6, 2010 at 7:44 AM, Dmitriy Poznyak
<Dmitriy.Poznyak@soc.kuleuven.be> wrote:
> Hello all,
>
> I am estimating three identical multinomial models with bootstrap for the different years of survey data, for instance. 1991, 1999 and 2007. Aside from comparing predicted probabilities, which I assume shouldn't pose any problem, I need to compare Chi2/L2 coefficients for the different variables in the model. The rationale for doing this, is that the fit of the individual predictors (e.g. social-demographic stuff) declines through time. Here's where the question arises. Clearly, samples in different years have different size, and perhaps different design effects, and so on.
>
> In order to possibly address these issues I ran the bootstrapped models with the same number of iterations in each case:
> bootstrap, reps(2000) force: mlogit vote5  x y z ... ,base(1) cl(zip), [pweight=weight1], rrr
> Next, I test the effect of the predictors:  test x;  test z, etc. Again, the models' specification is identical for all years; what differs is the sample size and design.
>
> Considering the bootstrap method being used, will it be possible to compare Chi2/L2 and perhaps pseudo R2 coefficients for different samples in this case, and, if not, what would be my strategy. Note that pooling datasets is not feasible due to several reasons, like weighting, etc.
>
> Thanks for your suggestions,
> Dmitriy
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index