Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Very high t- statistics and very small standard errors

 From Joerg Luedicke To statalist@hsphsun2.harvard.edu Subject Re: st: Very high t- statistics and very small standard errors Date Tue, 1 May 2012 07:41:39 -0700

```Laurie,

Let's have a look at a simple difference in means, by regressing mpg
on foreign in the auto dataset:

sysuse auto, clear
reg mpg foreign

We can see that the difference in means is 4.95. If we were interested
in significance testing we can calculate the t-value, which simply
measures how many times the difference between the two groups is away
from zero:

di 4.945804/1.362162
3.6308486

and then attach a p-value by assuming some probability distribution.
However, the point is that whatever test you use, the result will
depend on your t-value which in turn depends on your standard error.
Now, how is the standard error being calculated? Say we were only
interested in a standard error of one mean (to build a confidence
interval, for example), then the standard error is simply the sample
standard deviation, divided by the square root of your sample size.
For example, if we look at the mpg variable again

sum  mpg

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
mpg |        74     21.2973    5.785503         12         41

we can calculate the SE:

di 5.785503/sqrt(74)
.67255106

which is what you would get by invoking Stata's -mean-:

mean mpg

Mean estimation                     Number of obs    =      74

--------------------------------------------------------------
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
mpg |    21.2973   .6725511       19.9569    22.63769
--------------------------------------------------------------

So, what you can clearly see now is how the SE depends on your sample
size. Imagine the auto dataset would have 1 million cases, but mpg
would have the same sample standard deviation:

di 5.785503/sqrt(1000000)
.0057855

and note how small the standard error would be in that case.

Now this is all most basic and highly introductory stuff and if you
lack these basics, however, I would strongly advice doing yourself a
favor and attend some introductory courses and/or read some
introductory textbooks before doing any (serious) data analysis.

Joerg

On Mon, Apr 30, 2012 at 6:18 PM, Laurie Molina <molinalaurie@gmail.com> wrote:
> It is not the first time I hear people say that when you have a lot of
> observations everything is significant... Is it because the lenght of
> the confidence intervals is inversely related to the number of
> observations considered? Or could you tell me what is the logic behind
> saying that with a lot of observations everything is statistically
> significant?
> Thank you very much again!
>
> On Mon, Apr 30, 2012 at 9:10 PM, Richard Williams
> <richardwilliams.ndu@gmail.com> wrote:
>> At 07:54 PM 4/30/2012, Laurie Molina wrote:
>>>
>>> Hi everybody,
>>> I'm running some OLS with around 4 million observations and 6
>>> explanatory variables. My coefficients are always significants, with
>>> very high t statistics and very low standard errors. for example t
>>> statistic=20.6 and standard error= .000023. This is a cross sectional
>>> data set.
>>> I have run the VIF test and for all the variables the variance
>>> inflation factor is less than 3.
>>> I have also ran the Durbin test creating an index variable (_n) to see
>>> wheter there is some sort of correlation in the error terms of my
>>> regresion, but there is not.
>>> Should I bee concerned about the significance of my coefficients? Is
>>> there any problem with getting such a large t statistics and small
>>> standard errors?
>>> Thank you all in advance and best regards!!
>>
>>
>> With 4 million cases it is hard not to get statistically significant
>> results. Whether they are worth caring about is another matter. For example,
>> a \$2 difference in the incomes of men and women may be statistically
>> significant. \$2 is not the same as \$0. But how much you should care is
>> another matter. So, if everything is highly significant, I would ask myself
>> what the substantive significance of the findings is. (Actually I would do
>> that even if the results were not so significant - I think many people do
>> not pay enough attention to "So What?" sorts of questions.)
>>
>>
>> -------------------------------------------
>> Richard Williams, Notre Dame Dept of Sociology
>> OFFICE: (574)631-6668, (574)631-6463
>> HOME:   (574)289-5227
>> EMAIL:  Richard.A.Williams.5@ND.Edu
>> WWW:    http://www.nd.edu/~rwilliam
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```