# st: RE: Handling big samples

 From "Liao, Junlin" To "statalist@hsphsun2.harvard.edu" Subject st: RE: Handling big samples Date Tue, 22 Feb 2011 16:46:50 +0000

```Correct me if I'm wrong. I think this is a general misunderstanding regarding sample size. I was once asked the question if a study could be over-powered (as in a regression analysis). The unspeaking assumption is that over-powering will pick up noises and makes them significant because of large sample sizes. In other words, large samples make it easier to find significances that are not meaningful.

My take on this is that you cannot over-power your study statistically. Of course you can economically: collect data more than needed to prove or disprove a hypothesis. The larger the N, the smaller the errors (both type I and II), and the more reliable your statistical conclusions.

Put statistical conclusions aside, you have the clinical significance to deal with as well. With a large sample, you have the power to detect small differences. The differences may be statistically significant. However, they may be too small clinically to consider significant. Then this is what "over-powering" does, it picks up small differences that may not be clinically significant. It's generally easy to assess the issue. Therefore I see no harm of large samples but benefits. The only concern goes back to the cost to acquire large samples.

Junlin

Hi. I'm running a regression in stata with a big sample. All the estimates turn to be significant, but how can I be sure this is due to the test and not to the sample size?? Does anyone know where can I find papers that talk about this special issue. Thanks!!

