 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Optimal ratio of sample sizes in a two sample t test

 From Douglas McKee To "statalist@hsphsun2.harvard.edu" Subject Re: st: Optimal ratio of sample sizes in a two sample t test Date Tue, 8 Jan 2013 11:20:55 -0500

```Hi Jay,

This question  came up while I taught basic power analysis yesterday.  So there isn't a specific application yet.  But if I'm understanding you correctly, the standard methods (like Satterthwaites or Welch's) for dealing with comparing means of two populations with unequal variances don't work well when the variances are very unequal.  That is, if you can't trust the p-values in these cases, you probably wouldn't trust the power calculations either.  Good to know!

All best,

Doug
>
> On Mon, Jan 7, 2013 at 2:50 PM, Douglas McKee <douglas.mckee@yale.edu> wrote:
>
> <<<Suppose I'm trying to choose the sample sizes for a study where I
> have two populations that are equally costly to sample.  If I know the
> standard deviations of each group, I can use sampsi to compute the n's
> when I specify effect size, alpha and desired power.  But suppose one
> sample has a low variance and the other has a high variance.  Doesn't
> this mean I should sample more from the high variance group?  Is there
> a way to make Stata tell me the optimal n1/n2 ratio?  Or should I
> write a wrapper around sampsi that tries a variety of ratios and tells
> me which one yields the smallest n1+n2?>>>
>
> I don't see this as being an easy problem to solve as the decision
> rule with clearly unequal variances is one of the whack-a-mole
> problems in statistics:
>
> http://en.wikipedia.org/wiki/Behrens_Fisher_problem
>
> How different are the standard deviations? If the ratio isn't too far
> from 2:1, then you probably are OK doing near-equal sample size, but
> of course you could optimize the MSE, which would lean towards the
> population with smaller variance.
>
> Differences in SD may be the sign of a larger issue, though. Are the
> variables better analyzed on a different scale? So for instance, maybe
> it would be better to use a generalized linear model to accommodate
> linearity inside a log link.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```