[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: gllamm & stratified sampling design

From   Steven Samuels <>
Subject   Re: st: gllamm & stratified sampling design
Date   Tue, 22 Apr 2008 09:13:40 -0400


Thank you, Mabel.

So, households ARE your primary sampling units within strata. Your - svyset- is okay if you have only one respondent per household. If you have multiple respondents it should look something like:

svyset hhid [pweight=wt_natio], strata(usstate) || _n

For descriptive (not analytic) analyses, you can include finite population correction terms.

I think you do have multilevel data--people are arranged in higher- level units, areas and states. If you are not sure, then you should consult a good reference before proceeding. A decent online lecture (with some typos) is: index.htm. See: multilevel-models/index.shtml. Also: thegeocodingproject/webpage/monograph/glossary.htm or http://

Although I am not expert in GLLAMM, I believe (but may be wrong) that:

1. GLLAMM is not a Stata survey program, so it will ignore -svyset-.
2. GLLAMM will accept your probability weights.
3. You will treat your strata as higher-level random effects in your analysis.


4. If you have a sufficient sample sizes in communities other than your original strata, you might choose to make those communities higher-level units as well.
5. If your outcome is "community participation" than local area of residence is probably more important than state
6. If your weights are 'national weights', I would not necessarily use them as is. For example, if your target population is the entire country, then the weights for a single state or area may be so distorted that they will not represent that state.
7.If your outcome is 'community participation' I would think that local community would be more important that state of residence.

Good luck!


n Apr 21, 2008, at 10:11 PM, Mabel Andalon wrote:

Many thanks to Jay and Sebastian for the references. I just finished reading the paper, but I´m not sure I have fully understood what is going on.


I appreciate your help and interest. I only have one cross-section. The main features of the survey design are the following:
1.* The sample was drawn from a household database of approximately 11 million households in the United States that are identified as Latino or Hispanic. The universe of analysis contains approximately 87.5% of the US Hispanic population.

*2.* The survey covers 15 states and the District of Columbia metropolitan area (including counties and municipalities in Virginia and Maryland). States were selected based on the overall size of the Latino/Hispanic population.

*3. *The sample is stratified by geographic designation, meaning that each state sample is a valid, stand-alone representation of that state´s Latino population.

*4.* Respondents were selected randomly from the Latino households in the jurisdictions covered (states) without replacement.

*5. *State sample sizes vary as a result of specific funders´ requests. The smallest sample size for any unit was 400, yielding a margin of error of less than ± 5% for each state.

*6. *A number of states were stratified internally. In each case but California, internal strata were represented proportionately in the final sample. In California, additional strata were imposed in a non-proportional fashion, owing in part to the larger sample size, to allow greater between-region comparisons.

*7.* I don´t have the formula for how weights were computed. The survey´s documentation says that national weights were constructed so that the numbers are accurately representative of the universe covered by the study.

Please let me know if you think my svyset statement is inaccurate:

svyset [pweight=wt_natio], strata(usstate)

wt_natio is the national weight described in *#7* above. usstate is the var that identifies the within-state strata.

I think I should care about conducting a multilevel analysis because I have merged two types of state-level characteristics to each individual in my sample. One reflects state-level characteristics of the persons' country of origin before s(he) arrived to the US. The other reflects the characteristics of the state in which the person lives currently.

Thanks very much,


Steven Samuels wrote:

I have not read the article Sebastian referred to so I will ask only about your design. This is a multistage design, so, for a start, your -svyset- statement is incomplete. Please give more details. Exactly what was the sampling protocol? What was frame? What were the target populations at each stage of ssampling. How did the surveysors get from states to communities to individuals? Was there intermediate sampling of households or areas smaller than communities, or both? Was sampling with or without replacement, and, at what stages? How were the weights computed? Were Was there post-stratification weighting? Have you multiple years of data?



On Apr 21, 2008, at 10:51 AM, Mabel Andalon wrote:

Dear All,

I am estimating a model of community participation (1-0) using individual-level data. These data are of immigrants in the US and comes from a stratified simple random sampling survey. The strata are US states (usstate). I've always used the svy option when analyzing these data setting:

svyset [pweight=wt_natio], strata(usstate)

I just merged these data with contextual data from people's state of origin in a foreign country based on year of arrival to the US. And I also merged US state-level data based on current state of residence. That is, any two people who arrived in the same year from the same state and country and who live in the same US state were merged the same state-level data.

My questions are two:
1. Is this considered multilevel data?
2. If so, how can I conduct a true multilevel analysis using glamm and still include the features of sampling design (i.e. stratification).

So far, I have estimated:

gllamm participation $xvars , i(individual fostate year usstate) pweight(wt) f(binom) l(logit) adapt

i = individuals/inmigrants
fostate = foreign state of residence
year= year of arrival to the US
usstate= current state of residence

I'm not even sure that I have correctly defined the hierarchical, nested clusters in the i() option. The weights are individual's sampling weights.

Any suggestions will be highly appreciated.



* For searches and help try:
*   For searches and help try:
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index