Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Using sampling/probability weights for mixed design ANOVA in STATA

 From Meg Dennison To statalist@hsphsun2.harvard.edu Subject Re: st: Using sampling/probability weights for mixed design ANOVA in STATA Date Wed, 15 Jun 2011 13:25:15 +1000

```Hi Steven,

I am still familiarizing myself with STATA and have looked at this
syntax below using anova, regress and pweights commands - can this
handle repeated measures (within-subjects) variables?

Thanks again,

Meg

On Wed, Jun 1, 2011 at 9:58 AM, Steven Samuels <sjsamuels@gmail.com> wrote:
> --
>
> Meg, here's an example of how to do an -anova- analysis with a probability weight.  It exploits the fact that -anova- is equivalent to an ordinary multiple regression command (Stata -regress-, for example) and Stata's -regress- _will_ take a probability weight. The trick is to run -regress- without options after -anova-  to show the implied predictors that you need to put on the right hand side of the equation. You will need to prefix the categorical factors with "i.". The example is from the -anova- help illustration of a split-plot design:
>
> *******START******************
> log close
> log using aovtest, replace text
>
> anova score prog / class|prog skill prog#skill / class#skill|prog / group|class#skill|prog /, dropemptycells
> regress   //
>
> //regression equivalent from -regress- statement after -anova-: add the "i." in front of the categorical variables
>
> regress score i.prog i.class#i.prog i.skill i.program#i.skill ///
> i.class#i.skill#i.program  i.group#i.class#i.skill#i.program
>
> // regression with probability weight
> set seed 4816033
> gen finalwt =20*(uniform() +.05)  // weights>1
>
> regress score i.prog i.class#i.prog i.skill i.program#i.skill ///
> i.class#i.skill#i.program  i.group#i.class#i.skill#i.program, ///
> pweight(finalwt)   // <--pweight option
> *********END**************
>
> Meg,
> I'm not sure that you should or can do a full-fledged survey analysis,
> From your description, you selected post-codes at random, then schools within post-costs at random. (The term "randomly selected" actually is not informative, as this can describe many different procedures.) In survey parlance, the post-codes would be primary sampling units (PSUs) and the schools would be secondary sampling units, with no sampling thereafter.  You made a further selection on the basis of test questions.
>
> An idle question: Did you round some numbers or leave out some details? It would be odd to get exactly 2500 and 400 as the result of a sampling/testing process.
>
> What you are calling "sampling bias" is, in fact, selection bias. You say that the factors on which the selection took place have nothing to do with gender. But the issue is: would it have any relation to the development of brain volume? If so, the 400 children would not be representative of the larger population.
>
> Your more critical concern should be the large non-reponse bias at the final stage.  The only way to (partly) alleviate that is to do reponse weighting: predict the probability of participation from variables you know for all 400 children who were invited; use the inverse as a response weight and multiply this by the selection weight..  See: Sharon Lohr. 2009. Sampling: Design and Analysis. 2nd ed. Boston, MA: Cengage Brooks/Cole.
>
> Here's what I suggest
>
> 1. Ignoring the  weights, run the -anova- command. I'm not that familiar with the -anova- command, but I would guess that the standard error for hemisphere should be the subject x hemisphere interaction term. You can add terms for postal code, nesting of schools within postal code, and of subjects within schools.
>
> 2. Run the -regress- post-anova command to see how to set up the problem using Stata's -regress-.
>
> 3. Now compute the final weight with the non-response correction.
>
> 4. Run the -regress- equivalent of  the anova  but incorporating your final weight with a [pweight = ] option.
>
> 5. Do the same, but omit the within-subject terms and add as a final option (after the comma) "vce(cluster subject)"
>
>
>
> These analyses are defensible because it allows for the possibility of postal code and school differences but does not use the design to determine standard errors.
>
> 5. Try a full survey analysis:  -svyset- your data:
>
> ************************************************
> svyset post_code  [pweight=final_wt] || school
> ************************************************
>
> Try variations of the following model
>
> ************************************************************************************
> svy: regress brainvol  i.time i.sex i.hemisphere  //main effects. Add interactions as desired
> ************************************************************************************
> (See the -help- for "factor variables", assuming you have Stata 11)
>
> Steve
>
>
>
> On May 26, 2011, at 8:36 PM, Meg Dennison wrote:
>
> Hi All,
>
>
>
> But The description of your data  unclear.  You refer to one
> between-subject and two within-subject "variables", but to "the"
> (single?) repeated measures variable with two levels. Isn't this a
> within-subject variable?.  By two levels do you mean two occasions (if
> longitudinal)?  Which, if any variables (besides subject), do you
> consider to be "random effects"?
>
> - I am looking at brain development over time. I have collected data
> on brain measures at two time points for each subject (the repeated
> involve collecting from both the left and the right hemisphere within
> a single person - and are not independent, so they are being treated
> as another within subjects variable (hemisphere - left and right).
>
> The between subjects variable is sex (obviously, male and female).
>
> So please clarify what the variables are and  list the data for some
> subjects, so that we can see where you are starting from,.
>
> So, the data would look like this:
>
> Subject BrainVol Time Hemisphere
> 1               1345      1        left
> 1               2345      2        left
> 1               3546      1        right
> 1               3457      2        right
> etc
>
>
> In any case, for complex survey data, the standard errors for
> estimates are governed by variation of primary sampling units (PSUs,
> first-stage clusters) within strata, so the usual ANOVA formulas would
> not ordinarily apply. Stata can analyze some mixed model designs with
> survey data.
>
> Some other questions that will help us suggest analyses:
> 1. What is the sampling design? If there were strata, do they
> correspond to the "between-subject" variable?
> a metropolitan city. Within these postcodes (strata?), schools were
> randomly selected to participate (clusters?). All Grade 5 classes
> within these schools were asked to complete a survey (obviously not
> all consented or were present at school that day etc). The survey they
> completed consisted of four factors. Two of these factors were used to
> select subjects for further participation - the probability of being
> selected is the probability weights that I have based on this sampling
> bias. From this initial sample of about 2500, 400 were invited to
> participate in the research, and from those who were invited, I have
> 101 who participated in my study. The variable on which they were
> initially sampled does not correspond to sex - the BS variable in my
> study. I am not interested in the variable on which the sampling bias
> was introduced - my data is derived from a larger research project for
> which this initial sampling bias was desirable.
>
> 2. Are replicate (bootstrap, jackknife, BRR) weights available? Did
> the survey distributor provide SAS or SPSS macros to compute them?
> No, the selection was not done using these programs.
>
> 3. What questions are you trying to answer.  What parameters do you
> hope to estimate or test in your analysis?
> I am interested in describing typical brain development - how it
> changes over time by sex and hemisphere, and their interaction. I
> believe that the initial sample of 2500 was reasonably representative
> of normally developing children (obviously with the caveats of being
> living in a certain country, being at school, living in city etc etc).
> I would like to correct for the sampling bias that was introduced.
>
>
> Meg
>
>
> 4. What version of Stata do you have> Version 11.
>
>
> On Tue, May 24, 2011 at 11:54 PM, Steven Samuels <sjsamuels@gmail.com> wrote:
>>
>> Hi, Meg.
>>
>> Welcome to Stata!  You will find that Stata's regression and survey capabilities are both far superior to those of SPSS.
>>
>> But The description of your data  unclear.  You refer to one between-subject and two within-subject "variables", but to "the" (single?) repeated measures variable with two levels. Isn't this a within-subject variable?.  By two levels do you mean two occasions (if longitudinal)?  Which, if any variables (besides subject), do you consider to be "random effects"?
>>
>> So please clarify what the variables are and  list the data for some  subjects, so that we can see where you are starting from,.
>>
>> In any case, for complex survey data, the standard errors for estimates are governed by variation of primary sampling units (PSUs, first-stage clusters) within strata, so the usual ANOVA formulas would not ordinarily apply. Stata can analyze some mixed model designs with survey data.
>>
>> Some other questions that will help us suggest analyses:
>> 1. What is the sampling design? If there were strata, do they correspond to the "between-subject" variable?
>> 2. Are replicate (bootstrap, jackknife, BRR) weights available? Did the survey distributor provide SAS or SPSS macros to compute them?
>> 3. What questions are you trying to answer.  What parameters do you hope to estimate or test in your analysis?
>> 4. What version of Stata do you have>
>>
>> Steve
>> sjsamuels@gmail.com
>>
>>
>> On May 23, 2011, at 9:20 AM, Meg Dennison wrote:
>>
>> Hi,
>>
>> I have a complex sample, for which I need to use sampling weights
>> (probability weights). I already have these values derived from the
>> initial sampling selection. I wanted to then perform a mixed design
>> ANOVA (with 2 within subjects variables and one between subjects
>> variable).The repeated measures variable only has 2 levels.
>>
>> I have only used SPSS before and the Complex Sampling Add-on module
>> only allows for univariate ANOVA. Can STATA perform this type of
>> analysis? From what I could see from looking at the GUI and reading
>> the manual, probability weights (pweights) could not be used for mixed
>> ANOVA?
>>
>>
>>
>>
>> Kind regards,
>>
>> Meg
>>
>> --
>>
>> Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
>> School of Psychological Sciences, University of Melbourne
>> megd@student.unimelb.edu.au
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
>
> Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
> School of Psychological Sciences, University of Melbourne
> megd@student.unimelb.edu.au
>
>
>
>
>
> --
>
> Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
> School of Psychological Sciences, University of Melbourne
> megd@student.unimelb.edu.au
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--

Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
School of Psychological Sciences, University of Melbourne
megd@student.unimelb.edu.au

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```