 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: Optimal ratio of sample sizes in two sample t test

 From "Feiveson, Alan H. (JSC-SK311)" To "statalist@hsphsun2.harvard.edu" Subject RE: st: Optimal ratio of sample sizes in two sample t test Date Tue, 8 Jan 2013 08:26:46 -0600

```In general, with equal costs, optimal allocation for estimating an overall mean should be proportional to the reciprocal of the variances. This is not exactly what you are doing, but you could use existing software (for example Russ Lenth's)  for power as a function of sample sizes in the two-sample Satterwaite approximate t-test, where you constrain your relative sample sizes to be proportional to the reciprocal of your variance estimates. Then use trial and error to achieve a desired power.

Al Feiveson

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of JVerkuilen (Gmail)
Sent: Monday, January 07, 2013 3:09 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Optimal ratio of sample sizes in two sample t test

On Mon, Jan 7, 2013 at 2:50 PM, Douglas McKee <douglas.mckee@yale.edu> wrote:

<<<Suppose I'm trying to choose the sample sizes for a study where I have two populations that are equally costly to sample.  If I know the standard deviations of each group, I can use sampsi to compute the n's when I specify effect size, alpha and desired power.  But suppose one sample has a low variance and the other has a high variance.  Doesn't this mean I should sample more from the high variance group?  Is there a way to make Stata tell me the optimal n1/n2 ratio?  Or should I write a wrapper around sampsi that tries a variety of ratios and tells me which one yields the smallest n1+n2?>>>

I don't see this as being an easy problem to solve as the decision rule with clearly unequal variances is one of the whack-a-mole problems in statistics:

http://en.wikipedia.org/wiki/Behrens_Fisher_problem

How different are the standard deviations? If the ratio isn't too far from 2:1, then you probably are OK doing near-equal sample size, but of course you could optimize the MSE, which would lean towards the population with smaller variance.

Differences in SD may be the sign of a larger issue, though. Are the variables better analyzed on a different scale? So for instance, maybe it would be better to use a generalized linear model to accommodate linearity inside a log link.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```