Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Optimal ratio of sample sizes in two sample t test

From	"Feiveson, Alan H. (JSC-SK311)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: Optimal ratio of sample sizes in two sample t test
Date	Tue, 8 Jan 2013 08:26:46 -0600

In general, with equal costs, optimal allocation for estimating an overall mean should be proportional to the reciprocal of the variances. This is not exactly what you are doing, but you could use existing software (for example Russ Lenth's)  for power as a function of sample sizes in the two-sample Satterwaite approximate t-test, where you constrain your relative sample sizes to be proportional to the reciprocal of your variance estimates. Then use trial and error to achieve a desired power.

Al Feiveson

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of JVerkuilen (Gmail)
Sent: Monday, January 07, 2013 3:09 PM
To: [email protected]
Subject: Re: st: Optimal ratio of sample sizes in two sample t test

On Mon, Jan 7, 2013 at 2:50 PM, Douglas McKee <[email protected]> wrote:

<<<Suppose I'm trying to choose the sample sizes for a study where I have two populations that are equally costly to sample.  If I know the standard deviations of each group, I can use sampsi to compute the n's when I specify effect size, alpha and desired power.  But suppose one sample has a low variance and the other has a high variance.  Doesn't this mean I should sample more from the high variance group?  Is there a way to make Stata tell me the optimal n1/n2 ratio?  Or should I write a wrapper around sampsi that tries a variety of ratios and tells me which one yields the smallest n1+n2?>>>

I don't see this as being an easy problem to solve as the decision rule with clearly unequal variances is one of the whack-a-mole problems in statistics:

http://en.wikipedia.org/wiki/Behrens_Fisher_problem

How different are the standard deviations? If the ratio isn't too far from 2:1, then you probably are OK doing near-equal sample size, but of course you could optimize the MSE, which would lean towards the population with smaller variance.

Differences in SD may be the sign of a larger issue, though. Are the variables better analyzed on a different scale? So for instance, maybe it would be better to use a generalized linear model to accommodate linearity inside a log link.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Optimal ratio of sample sizes in two sample t test
  - From: Douglas McKee <[email protected]>
- Re: st: Optimal ratio of sample sizes in two sample t test
  - From: "JVerkuilen (Gmail)" <[email protected]>

Prev by Date: st: Retaining variable labels when converting from wide to long form
Next by Date: Re: st: Retaining variable labels when converting from wide to long form
Previous by thread: Re: st: Optimal ratio of sample sizes in two sample t test
Next by thread: st: Why -margins- does not present marginal effects for interaction terms
Index(es):
- Date
- Thread