[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Stas Kolenikov" <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: gllamm & stratified sampling design |

Date |
Tue, 22 Apr 2008 17:08:22 -0500 |

On Mon, Apr 21, 2008 at 9:51 AM, Mabel Andalon <mabel.andalon@gmail.com> wrote: > Dear All, > > I am estimating a model of community participation (1-0) using > individual-level data. These data are of immigrants in the US and comes from > a stratified simple random sampling survey. The strata are US states > (usstate). I've always used the svy option when analyzing these data > setting: > > svyset [pweight=wt_natio], strata(usstate) -gllamm- is not a survey command (that can easily go with -svy- prefix), so there won't be much use for this statement. > I just merged these data with contextual data from people's state of origin > in a foreign country based on year of arrival to the US. And I also merged > US state-level data based on current state of residence. That is, any two > people who arrived in the same year from the same state and country and who > live in the same US state were merged the same state-level data. > > My questions are two: > 1. Is this considered multilevel data? Yes, but of the ugliest cross-classified kind. If an individual is level 1, what is your level 2? US state? The country they came from? There is no strict nesting, and instead there is a web of links: people from all different countries come to all different states at all different points in time. It is difficult to analyze data of this kind in any of the existing software packages, because the likelihood for this kind of data can only be obtained by integration over the whole data set at once, rather than by contiguous units within the same cluster. In your shoes, I would probably consider two of the three to be fixed effects, and model the third one as a random effect. For instance, treat the states and countries as fixed effects (if there are really big systematic differences you are expecting between states), and year as random effect (provided you have at least a dozen of different values there, and the decision when to move is more reasonably assumed random than the state they wanted to move to -- I am thinking this is the case since different states might have quite different immigration conditions, such as how easy it is to get a driver's license or SSN). > 2. If so, how can I conduct a true multilevel analysis using glamm and > still include the features of sampling design (i.e. stratification). > > So far, I have estimated: > > gllamm participation $xvars , i(individual fostate year usstate) > pweight(wt) f(binom) l(logit) adapt > > i = individuals/inmigrants > fostate = foreign state of residence > year= year of arrival to the US > usstate= current state of residence > > I'm not even sure that I have correctly defined the hierarchical, nested > clusters in the i() option. The weights are individual's sampling weights. As I said above, you don't really have individuals nested in fostate nested in year nested in usstate. True, individuals are nested in any of those conglomerates, but there is no nesting structure of the remaining identifiers. -gllamm- should've given you an error saying that your identifiers are not nested. You would need to specify weights for all levels; the level 1 weights that would be in the variable -wt1- will be your sampling weights, and the higher level weights -wt2-, -wt3-, ... will probably be 1, since you did not have any sampling on those levels. You would have to bid farewell to your stratification information: there is no way to accommodate that. If -xtmelogit- allowed for weights, that would be a notably faster alternative to -gllamm-, but it does not appear to support them. That's an impressive list to sample from, BTW. I did not know such lists of addresses existed, let alone spanning relatively elusive Hispanic households. Looks like you have some 6000 observations at least. I wouldn't have high expectations with -gllamm- for models like that in terms of computational time unless you have Stata/MP8 on an appropriate computational cluster. And if you want to do four levels of random effects, you will probably need to prepare yourself for a few hours per one instance of likelihood calculation, meaning likely about a week per iteration. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: Please do not reply to my Gmail address as I don't check it regularly. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: gllamm & stratified sampling design***From:*Mabel Andalon <mabel.andalon@gmail.com>

**References**:**Re: st: Re: Question Tobit model***From:*Johannes Geyer <JGeyer@diw.de>

**st: gllamm & stratified sampling design***From:*Mabel Andalon <mabel.andalon@gmail.com>

- Prev by Date:
**Re: st: Using Bayesian Information Criterion (BIC)** - Next by Date:
**Re: st: gllamm & stratified sampling design** - Previous by thread:
**Re: st: gllamm & stratified sampling design** - Next by thread:
**Re: st: gllamm & stratified sampling design** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |