Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
What you are calling "sampling bias" is, in fact, selection bias. You say that the factors on which the selection took place have nothing to do with gender. But the issue is: would it have any relation to the development of brain volume? If so, the 400 children would not be representative of the larger population. Your more critical concern should be the large non-reponse bias at the final stage. The only way to (partly) alleviate that is to do reponse weighting: predict the probability of participation from variables you know for all 400 children who were invited; use the inverse as a response weight and multiply this by the selection weight.. See: Sharon Lohr. 2009. Sampling: Design and Analysis. 2nd ed. Boston, MA: Cengage Brooks/Cole. Here's what I suggest 1. Ignoring the weights, run the -anova- command. I'm not that familiar with the -anova- command, but I would guess that the standard error for hemisphere should be the subject x hemisphere interaction term. You can add terms for postal code, nesting of schools within postal code, and of subjects within schools. 2. Run the -regress- post-anova command to see how to set up the problem using Stata's -regress-. 3. Now compute the final weight with the non-response correction. 4. Run the -regress- equivalent of the anova but incorporating your final weight with a [pweight = ] option. 5. Do the same, but omit the within-subject terms and add as a final option (after the comma) "vce(cluster subject)" These analyses are defensible because it allows for the possibility of postal code and school differences but does not use the design to determine standard errors. 5. Try a full survey analysis: -svyset- your data: ************************************************ svyset post_code [pweight=final_wt] || school ************************************************ Try variations of the following model ************************************************************************************ svy: regress brainvol i.time i.sex i.hemisphere //main effects. Add interactions as desired ************************************************************************************ (See the -help- for "factor variables", assuming you have Stata 11) Steve samplerx@earthlink.net On May 26, 2011, at 8:36 PM, Meg Dennison wrote: Hi All, Steven thanks for your reply. I have inserted my answers below. But The description of your data unclear. You refer to one between-subject and two within-subject "variables", but to "the" (single?) repeated measures variable with two levels. Isn't this a within-subject variable?. By two levels do you mean two occasions (if longitudinal)? Which, if any variables (besides subject), do you consider to be "random effects"? - I am looking at brain development over time. I have collected data on brain measures at two time points for each subject (the repeated measure - baseline and follow up). Additionally, these brain measures involve collecting from both the left and the right hemisphere within a single person - and are not independent, so they are being treated as another within subjects variable (hemisphere - left and right). The between subjects variable is sex (obviously, male and female). So please clarify what the variables are and list the data for some subjects, so that we can see where you are starting from,. So, the data would look like this: Subject BrainVol Time Hemisphere 1 1345 1 left 1 2345 2 left 1 3546 1 right 1 3457 2 right etc In any case, for complex survey data, the standard errors for estimates are governed by variation of primary sampling units (PSUs, first-stage clusters) within strata, so the usual ANOVA formulas would not ordinarily apply. Stata can analyze some mixed model designs with survey data. Some other questions that will help us suggest analyses: 1. What is the sampling design? If there were strata, do they correspond to the "between-subject" variable? The sampling design involved postcodes being randomly selected across a metropolitan city. Within these postcodes (strata?), schools were randomly selected to participate (clusters?). All Grade 5 classes within these schools were asked to complete a survey (obviously not all consented or were present at school that day etc). The survey they completed consisted of four factors. Two of these factors were used to select subjects for further participation - the probability of being selected is the probability weights that I have based on this sampling bias. From this initial sample of about 2500, 400 were invited to participate in the research, and from those who were invited, I have 101 who participated in my study. The variable on which they were initially sampled does not correspond to sex - the BS variable in my study. I am not interested in the variable on which the sampling bias was introduced - my data is derived from a larger research project for which this initial sampling bias was desirable. 2. Are replicate (bootstrap, jackknife, BRR) weights available? Did the survey distributor provide SAS or SPSS macros to compute them? No, the selection was not done using these programs. 3. What questions are you trying to answer. What parameters do you hope to estimate or test in your analysis? I am interested in describing typical brain development - how it changes over time by sex and hemisphere, and their interaction. I believe that the initial sample of 2500 was reasonably representative of normally developing children (obviously with the caveats of being living in a certain country, being at school, living in city etc etc). I would like to correct for the sampling bias that was introduced. Thanks in advance Meg 4. What version of Stata do you have> Version 11. On Tue, May 24, 2011 at 11:54 PM, Steven Samuels <sjsamuels@gmail.com> wrote: > > Hi, Meg. > > Welcome to Stata! You will find that Stata's regression and survey capabilities are both far superior to those of SPSS. > > But The description of your data unclear. You refer to one between-subject and two within-subject "variables", but to "the" (single?) repeated measures variable with two levels. Isn't this a within-subject variable?. By two levels do you mean two occasions (if longitudinal)? Which, if any variables (besides subject), do you consider to be "random effects"? > > So please clarify what the variables are and list the data for some subjects, so that we can see where you are starting from,. > > In any case, for complex survey data, the standard errors for estimates are governed by variation of primary sampling units (PSUs, first-stage clusters) within strata, so the usual ANOVA formulas would not ordinarily apply. Stata can analyze some mixed model designs with survey data. > > Some other questions that will help us suggest analyses: > 1. What is the sampling design? If there were strata, do they correspond to the "between-subject" variable? > 2. Are replicate (bootstrap, jackknife, BRR) weights available? Did the survey distributor provide SAS or SPSS macros to compute them? > 3. What questions are you trying to answer. What parameters do you hope to estimate or test in your analysis? > 4. What version of Stata do you have> > > Steve > sjsamuels@gmail.com > > > On May 23, 2011, at 9:20 AM, Meg Dennison wrote: > > Hi, > > I have a complex sample, for which I need to use sampling weights > (probability weights). I already have these values derived from the > initial sampling selection. I wanted to then perform a mixed design > ANOVA (with 2 within subjects variables and one between subjects > variable).The repeated measures variable only has 2 levels. > > I have only used SPSS before and the Complex Sampling Add-on module > only allows for univariate ANOVA. Can STATA perform this type of > analysis? From what I could see from looking at the GUI and reading > the manual, probability weights (pweights) could not be used for mixed > ANOVA? > > Is there another way I should be thinking about this? > > Thanks in advance for your help, > > > Kind regards, > > Meg > > -- > > Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate > School of Psychological Sciences, University of Melbourne > megd@student.unimelb.edu.au > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate School of Psychological Sciences, University of Melbourne megd@student.unimelb.edu.au -- Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate School of Psychological Sciences, University of Melbourne megd@student.unimelb.edu.au * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/