Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]

What you are calling "sampling bias" is, in fact, selection bias. You say that the factors on which the selection took place have nothing to do with gender. But the issue is: would it have any relation to the development of brain volume? If so, the 400 children would not be representative of the larger population.

Your more critical concern should be the large non-reponse bias at the final stage.  The only way to (partly) alleviate that is to do reponse weighting: predict the probability of participation from variables you know for all 400 children who were invited; use the inverse as a response weight and multiply this by the selection weight..  See: Sharon Lohr. 2009. Sampling: Design and Analysis. 2nd ed. Boston, MA: Cengage Brooks/Cole. 

Here's what I suggest

1. Ignoring the  weights, run the -anova- command. I'm not that familiar with the -anova- command, but I would guess that the standard error for hemisphere should be the subject x hemisphere interaction term. You can add terms for postal code, nesting of schools within postal code, and of subjects within schools.  

2. Run the -regress- post-anova command to see how to set up the problem using Stata's -regress-.

3. Now compute the final weight with the non-response correction.

4. Run the -regress- equivalent of  the anova  but incorporating your final weight with a [pweight = ] option.

5. Do the same, but omit the within-subject terms and add as a final option (after the comma) "vce(cluster subject)"



These analyses are defensible because it allows for the possibility of postal code and school differences but does not use the design to determine standard errors.

5. Try a full survey analysis:  -svyset- your data:

************************************************
 svyset post_code  [pweight=final_wt] || school
************************************************

Try variations of the following model

************************************************************************************
svy: regress brainvol  i.time i.sex i.hemisphere  //main effects. Add interactions as desired
************************************************************************************
(See the -help- for "factor variables", assuming you have Stata 11)

Steve
[email protected]



On May 26, 2011, at 8:36 PM, Meg Dennison wrote:

Hi All,

Steven thanks for your reply. I have inserted my answers below.


But The description of your data  unclear.  You refer to one
between-subject and two within-subject "variables", but to "the"
(single?) repeated measures variable with two levels. Isn't this a
within-subject variable?.  By two levels do you mean two occasions (if
longitudinal)?  Which, if any variables (besides subject), do you
consider to be "random effects"?

- I am looking at brain development over time. I have collected data
on brain measures at two time points for each subject (the repeated
measure - baseline and follow up). Additionally, these brain measures
involve collecting from both the left and the right hemisphere within
a single person - and are not independent, so they are being treated
as another within subjects variable (hemisphere - left and right).

The between subjects variable is sex (obviously, male and female).

So please clarify what the variables are and  list the data for some
subjects, so that we can see where you are starting from,.

So, the data would look like this:

Subject BrainVol Time Hemisphere
1               1345      1        left
1               2345      2        left
1               3546      1        right
1               3457      2        right
etc


In any case, for complex survey data, the standard errors for
estimates are governed by variation of primary sampling units (PSUs,
first-stage clusters) within strata, so the usual ANOVA formulas would
not ordinarily apply. Stata can analyze some mixed model designs with
survey data.

Some other questions that will help us suggest analyses:
1. What is the sampling design? If there were strata, do they
correspond to the "between-subject" variable?
The sampling design involved postcodes being randomly selected across
a metropolitan city. Within these postcodes (strata?), schools were
randomly selected to participate (clusters?). All Grade 5 classes
within these schools were asked to complete a survey (obviously not
all consented or were present at school that day etc). The survey they
completed consisted of four factors. Two of these factors were used to
select subjects for further participation - the probability of being
selected is the probability weights that I have based on this sampling
bias. From this initial sample of about 2500, 400 were invited to
participate in the research, and from those who were invited, I have
101 who participated in my study. The variable on which they were
initially sampled does not correspond to sex - the BS variable in my
study. I am not interested in the variable on which the sampling bias
was introduced - my data is derived from a larger research project for
which this initial sampling bias was desirable.

2. Are replicate (bootstrap, jackknife, BRR) weights available? Did
the survey distributor provide SAS or SPSS macros to compute them?
No, the selection was not done using these programs.

3. What questions are you trying to answer.  What parameters do you
hope to estimate or test in your analysis?
I am interested in describing typical brain development - how it
changes over time by sex and hemisphere, and their interaction. I
believe that the initial sample of 2500 was reasonably representative
of normally developing children (obviously with the caveats of being
living in a certain country, being at school, living in city etc etc).
I would like to correct for the sampling bias that was introduced.

Thanks in advance

Meg


4. What version of Stata do you have> Version 11.


On Tue, May 24, 2011 at 11:54 PM, Steven Samuels <[email protected]> wrote:
> 
> Hi, Meg.
> 
> Welcome to Stata!  You will find that Stata's regression and survey capabilities are both far superior to those of SPSS.
> 
>  But The description of your data  unclear.  You refer to one between-subject and two within-subject "variables", but to "the" (single?) repeated measures variable with two levels. Isn't this a within-subject variable?.  By two levels do you mean two occasions (if longitudinal)?  Which, if any variables (besides subject), do you consider to be "random effects"?
> 
> So please clarify what the variables are and  list the data for some  subjects, so that we can see where you are starting from,.
> 
> In any case, for complex survey data, the standard errors for estimates are governed by variation of primary sampling units (PSUs, first-stage clusters) within strata, so the usual ANOVA formulas would not ordinarily apply. Stata can analyze some mixed model designs with survey data.
> 
> Some other questions that will help us suggest analyses:
> 1. What is the sampling design? If there were strata, do they correspond to the "between-subject" variable?
> 2. Are replicate (bootstrap, jackknife, BRR) weights available? Did the survey distributor provide SAS or SPSS macros to compute them?
> 3. What questions are you trying to answer.  What parameters do you hope to estimate or test in your analysis?
> 4. What version of Stata do you have>
> 
> Steve
> [email protected]
> 
> 
> On May 23, 2011, at 9:20 AM, Meg Dennison wrote:
> 
> Hi,
> 
> I have a complex sample, for which I need to use sampling weights
> (probability weights). I already have these values derived from the
> initial sampling selection. I wanted to then perform a mixed design
> ANOVA (with 2 within subjects variables and one between subjects
> variable).The repeated measures variable only has 2 levels.
> 
> I have only used SPSS before and the Complex Sampling Add-on module
> only allows for univariate ANOVA. Can STATA perform this type of
> analysis? From what I could see from looking at the GUI and reading
> the manual, probability weights (pweights) could not be used for mixed
> ANOVA?
> 
> Is there another way I should be thinking about this?
> 
> Thanks in advance for your help,
> 
> 
> Kind regards,
> 
> Meg
> 
> --
> 
> Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
> School of Psychological Sciences, University of Melbourne
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 



--

Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
School of Psychological Sciences, University of Melbourne
[email protected]





--

Meg Dennison BA(Hons) MPsych(Clin)/PhD Candidate
School of Psychological Sciences, University of Melbourne
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: Re: st: Using sampling/probability weights for mixed design ANOVA in STATA
Next by Date: st: interpreting output in survival analysis
Previous by thread: st: difficulty with aflogit command
Next by thread: st: interpreting output in survival analysis
Index(es):
- Date
- Thread