Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Optimal ratio of sample sizes in a two sample t test

From   Douglas McKee <>
To   "" <>
Subject   Re: st: Optimal ratio of sample sizes in a two sample t test
Date   Tue, 8 Jan 2013 11:20:55 -0500

Hi Jay,

This question  came up while I taught basic power analysis yesterday.  So there isn't a specific application yet.  But if I'm understanding you correctly, the standard methods (like Satterthwaites or Welch's) for dealing with comparing means of two populations with unequal variances don't work well when the variances are very unequal.  That is, if you can't trust the p-values in these cases, you probably wouldn't trust the power calculations either.  Good to know!

All best,

> On Mon, Jan 7, 2013 at 2:50 PM, Douglas McKee <> wrote:
> <<<Suppose I'm trying to choose the sample sizes for a study where I
> have two populations that are equally costly to sample.  If I know the
> standard deviations of each group, I can use sampsi to compute the n's
> when I specify effect size, alpha and desired power.  But suppose one
> sample has a low variance and the other has a high variance.  Doesn't
> this mean I should sample more from the high variance group?  Is there
> a way to make Stata tell me the optimal n1/n2 ratio?  Or should I
> write a wrapper around sampsi that tries a variety of ratios and tells
> me which one yields the smallest n1+n2?>>>
> I don't see this as being an easy problem to solve as the decision
> rule with clearly unequal variances is one of the whack-a-mole
> problems in statistics:
> How different are the standard deviations? If the ratio isn't too far
> from 2:1, then you probably are OK doing near-equal sample size, but
> of course you could optimize the MSE, which would lean towards the
> population with smaller variance.
> Differences in SD may be the sign of a larger issue, though. Are the
> variables better analyzed on a different scale? So for instance, maybe
> it would be better to use a generalized linear model to accommodate
> linearity inside a log link.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index