Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: gllamm & stratified sampling design


From   "Stas Kolenikov" <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: gllamm & stratified sampling design
Date   Tue, 22 Apr 2008 17:08:22 -0500

On Mon, Apr 21, 2008 at 9:51 AM, Mabel Andalon <mabel.andalon@gmail.com> wrote:
> Dear All,
>
>  I am estimating a model of community participation (1-0) using
> individual-level data. These data are of immigrants in the US and comes from
> a stratified simple random sampling survey. The strata are US states
> (usstate). I've always used the svy option when analyzing these data
> setting:
>
>  svyset  [pweight=wt_natio], strata(usstate)

-gllamm- is not a survey command (that can easily go with -svy-
prefix), so there won't be much use for this statement.

>  I just merged these data with contextual data from people's state of origin
> in a foreign country based on year of arrival to the US. And I also merged
> US state-level data based on current state of residence. That is, any two
> people who arrived in the same year from the same state and country and who
> live in the same US state were merged the same state-level data.
>
>  My questions are two:
>  1. Is this considered multilevel data?

Yes, but of the ugliest cross-classified kind. If an individual is
level 1, what is your level 2? US state? The country they came from?
There is no strict nesting, and instead there is a web of links:
people from all different countries come to all different states at
all different points in time. It is difficult to analyze data of this
kind in any of the existing software packages, because the likelihood
for this kind of data can only be obtained by integration over the
whole data set at once, rather than by contiguous units within the
same cluster. In your shoes, I would probably consider two of the
three to be fixed effects, and model the third one as a random effect.
For instance, treat the states and countries as fixed effects (if
there are really big systematic differences you are expecting between
states), and year as random effect (provided you have at least a dozen
of different values there, and the decision when to move is more
reasonably assumed random than the state they wanted to move to -- I
am thinking this is the case since different states might have quite
different immigration conditions, such as how easy it is to get a
driver's license or SSN).

>  2. If so, how can I conduct a true multilevel analysis using glamm and
> still include the features of sampling design (i.e. stratification).
>
>  So far, I have estimated:
>
>  gllamm participation $xvars , i(individual fostate year usstate)
> pweight(wt) f(binom) l(logit) adapt
>
>  i = individuals/inmigrants
>  fostate = foreign state of residence
>  year= year of arrival to the US
>  usstate= current state of residence
>
>  I'm not even sure that I have correctly defined the hierarchical, nested
> clusters in the i() option. The weights are individual's sampling weights.

As I said above, you don't really have individuals nested in fostate
nested in year nested in usstate. True, individuals are nested in any
of those conglomerates, but there is no nesting structure of the
remaining identifiers. -gllamm- should've given you an error saying
that your identifiers are not nested.

You would need to specify weights for all levels; the level 1 weights
that would be in the variable -wt1- will be your sampling weights, and
the higher level weights -wt2-, -wt3-, ... will probably be 1, since
you did not have any sampling on those levels. You would have to bid
farewell to your stratification information: there is no way to
accommodate that.

If -xtmelogit- allowed for weights, that would be a notably faster
alternative to -gllamm-, but it does not appear to support them.

That's an impressive list to sample from, BTW. I did not know such
lists of addresses existed, let alone spanning relatively elusive
Hispanic households.

Looks like you have some 6000 observations at least. I wouldn't have
high expectations with -gllamm- for models like that in terms of
computational time unless you have Stata/MP8 on an appropriate
computational cluster. And if you want to do four levels of random
effects, you will probably need to prepare yourself for a few hours
per one instance of likelihood calculation, meaning likely about a
week per iteration.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: Please do not reply to my Gmail address as I don't check
it regularly.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index