[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: gllamm & stratified sampling design

From   Mabel Andalon <>
Subject   Re: st: gllamm & stratified sampling design
Date   Tue, 22 Apr 2008 18:40:34 -0400

Many thanks to Steven for your helpful comments. I have one person per household, so I think Im OK.

Stats, thanks to you as well. I agree with you that there is no strict nesting. In fact, that is why I was so confused
as of how to define the hierarchical nested clusters. FYI, Stata actually reported an error. When I sent the initial
email the program had been running for some hours. I got an error message this morning...

I guess there is no clear way to proceed, but I will try to use a combination of fixed-random effects, as you


Stas Kolenikov wrote:

On Mon, Apr 21, 2008 at 9:51 AM, Mabel Andalon <> wrote:

Dear All,

I am estimating a model of community participation (1-0) using
individual-level data. These data are of immigrants in the US and comes from
a stratified simple random sampling survey. The strata are US states
(usstate). I've always used the svy option when analyzing these data

svyset [pweight=wt_natio], strata(usstate)
-gllamm- is not a survey command (that can easily go with -svy-
prefix), so there won't be much use for this statement.

I just merged these data with contextual data from people's state of origin
in a foreign country based on year of arrival to the US. And I also merged
US state-level data based on current state of residence. That is, any two
people who arrived in the same year from the same state and country and who
live in the same US state were merged the same state-level data.

My questions are two:
1. Is this considered multilevel data?
Yes, but of the ugliest cross-classified kind. If an individual is
level 1, what is your level 2? US state? The country they came from?
There is no strict nesting, and instead there is a web of links:
people from all different countries come to all different states at
all different points in time. It is difficult to analyze data of this
kind in any of the existing software packages, because the likelihood
for this kind of data can only be obtained by integration over the
whole data set at once, rather than by contiguous units within the
same cluster. In your shoes, I would probably consider two of the
three to be fixed effects, and model the third one as a random effect.
For instance, treat the states and countries as fixed effects (if
there are really big systematic differences you are expecting between
states), and year as random effect (provided you have at least a dozen
of different values there, and the decision when to move is more
reasonably assumed random than the state they wanted to move to -- I
am thinking this is the case since different states might have quite
different immigration conditions, such as how easy it is to get a
driver's license or SSN).

2. If so, how can I conduct a true multilevel analysis using glamm and
still include the features of sampling design (i.e. stratification).

So far, I have estimated:

gllamm participation $xvars , i(individual fostate year usstate)
pweight(wt) f(binom) l(logit) adapt

i = individuals/inmigrants
fostate = foreign state of residence
year= year of arrival to the US
usstate= current state of residence

I'm not even sure that I have correctly defined the hierarchical, nested
clusters in the i() option. The weights are individual's sampling weights.
As I said above, you don't really have individuals nested in fostate
nested in year nested in usstate. True, individuals are nested in any
of those conglomerates, but there is no nesting structure of the
remaining identifiers. -gllamm- should've given you an error saying
that your identifiers are not nested.

You would need to specify weights for all levels; the level 1 weights
that would be in the variable -wt1- will be your sampling weights, and
the higher level weights -wt2-, -wt3-, ... will probably be 1, since
you did not have any sampling on those levels. You would have to bid
farewell to your stratification information: there is no way to
accommodate that.

If -xtmelogit- allowed for weights, that would be a notably faster
alternative to -gllamm-, but it does not appear to support them.

That's an impressive list to sample from, BTW. I did not know such
lists of addresses existed, let alone spanning relatively elusive
Hispanic households.

Looks like you have some 6000 observations at least. I wouldn't have
high expectations with -gllamm- for models like that in terms of
computational time unless you have Stata/MP8 on an appropriate
computational cluster. And if you want to do four levels of random
effects, you will probably need to prepare yourself for a few hours
per one instance of likelihood calculation, meaning likely about a
week per iteration.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index