Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: ANOVAs and Probability Weights

From   Steve Samuels <>
Subject   Re: st: ANOVAs and Probability Weights
Date   Tue, 14 Feb 2012 17:50:08 -0500

Sorry, I see part of my answer was garbled when I pasted from my text editor. Here's a cleaner version.


To sum up what you've said so far: all schools in sets 1 and 2 nationally were invited to participate. There was no sampling of schools (since all were invited), but teachers were sampled from the combined rosters of the participating schools.  Set 3 is the result of a multistage sample.

To partly answer  your questions: regression coefficients are indeed legit as descriptions of finite populations: they are estimates of the coefficients that would be obtained if you did regression on every teacher in the population. If you wish to use the estimates to test hypotheses or form conclusions about causal impact, then they are estimates for a hypothetical set of repetitions of the finite population--a "super" population. 

You are interested in results for a subset of schools. In the survey literature, such subsets  are called "domains" (as in Lohr, 2009, ref below) or "subpopulations" (as in Stata). Stata adds the subpop() option to survey commands to handle subpopulations. This requires that the data set contain information about schools _not_ in the subpopulation. See: The requirement to use the -subpop- option will apply only to your set 3 analysis, as it is the only set in which schools were sampled. 

If the weights in any of the three sets were post-stratified to reflect national totals (e.g. teacher gender and age distributions), then those weights might be wrong for the subpopulation. In that case, you would need to re-weight.  

This is a difficult analysis, and before you go further, I suggest that you become familiar with fundamental survey concepts.  I recommend that you read  Sharon Lohr's Sampling: Design & Analysis (2009). Section 11.4, for example, might help to answer your very good question about "which population". The Stata Survey Manual has good Stata examples. So does the book by Heeringa, S., West, B. T., & Berglund, P. A. (2010): Applied survey data analysis. 


On Feb 13, 2012, at 6:34 PM, wrote:

Thank you for your response, Steve! I walked through the syntax you sent me-very helpful.

Now I have a few more questions, but first a few details...In my study there are three distinct sets of schools from which teacher rosters are obtained and, subsequently, teachers are randomly selected for participation.

The first two sets of schools are from the same sector/ master file and differ only by school type; there is no real "strata" per se as all member schools in this group are invited to participate in the survey, although there is some non-response. This group (in its entirety) is representative at the national level only. I have manually coded these schools by type utilizing external data.

The 3rd type of school comes from a much larger file which is representative at the US national, regional and state levels. There are strata identified for this sector.

When I run svy: regress, I do get a population estimate but I am unsure of the actual population which is being represented. Is the number generated legit since there are differences on the stratification? Another constraint I brought into the analysis was a school level criteria of 75% F/Reduced lunch eligibility for inclusion in the Public group...Again, I am unsure of how this is impacted (or impacts) the weighting procedures and the resulting target population.

Any insights would be appreciated.

ANOVA is equivalent to multiple regression on group indicators.

So -svyset- the data with the design information (clusters, strata, weights), and use -svy: reg-.

Note that hypothesis tests are not appropriate if you are interested in describing the particular finite population where the survey was done. For references, see:
Instead, you can assess differences in means with confidence intervals. In the descriptive setting, get  minimal standard errors by specifying the finite population correction, if non-negligible, in the -svyset- statement.
sysuse auto, clear
recode rep78 1/2=3
rename turn psu
svyset psu [pw = weight], strata(foreign)
svy: mean mpg, over(rep78)
// no covariates:
xi: svy: reg mpg i.rep78  // F test
testparm _Irep78*         //same
test _Irep78_4 _Irep78_5 , mtest(sidak)

// with covariates:
xi: svy: reg mpg i.rep78 length


On Feb 12, 2012, at 8:58 PM, wrote:

Greetings Stata Aficionados!

I would like to do some group means tests with a survey data set which includes final and replicate weights.

Could someone please direct me as to how I might perform "weighted" ANOVAs with such a data set using Stata 10.

Kind regards,

Don Stryker

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index