Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Slow STATA


From   rgates@stata.com (Richard Gates)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Slow STATA
Date   Tue, 12 Oct 2004 09:50:46 -0500

In response to Clive's attempt to run -tabulate, exact- on a table
with astronomical frequencies:

> Fair enough, but running a larger table on the -union- data still poses no
> problem on my machine (save the (minor?) error registered at the end):
> 
> . webuse union
> 
> . tab year union, exact all
> 
>  interview |      1 if union
>       year |         0          1 |     Total
> -----------+----------------------+----------
>         70 |     1,310        348 |     1,658
>         71 |     1,412        381 |     1,793
>         72 |     1,576        382 |     1,958
>         73 |     1,633        424 |     2,057
>         77 |     2,180        490 |     2,670
>         78 |     1,530        463 |     1,993
>         80 |     1,432        596 |     2,028
>         82 |     1,846        562 |     2,408
>         83 |     1,733        461 |     2,194
>         85 |     1,888        600 |     2,488
>         87 |     1,981        555 |     2,536
>         88 |     1,868        549 |     2,417
> -----------+----------------------+----------
>      Total |    20,389      5,811 |    26,200
> 
>          Pearson chi2(11) = 107.8144   Pr = 0.000
> likelihood-ratio chi2(11) = 105.0420   Pr = 0.000
>                Cramér's V =   0.0641
>                     gamma =   0.0342  ASE = 0.009
>           Kendall's tau-b =   0.0192  ASE = 0.005
> integer overflow due to large row margin frequencies
> r(1401);
> 

This is an example of a table not to request Fisher's exact probablities for.
The fexact algorithm uses a network for computing the probabilities.  This
network is comprised of levels of nodes, one level for each column in the
table.  A table with the same row totals as the input table is represented by
a path through these nodes.  The netork requires a key to identify each node
which, in essence, is a function of row totals of subtables. The function
computing the key for your example overflows the integer key. We actually use
two integers to store the key to allow larger tables, but it is likely infeasible
to compute Fisher's exact probability using the fexact algorithm for these
larger tables anyway.

For those that know some counting rules, be assured we do not generate the entire
network at once. Only two levels of nodes are in memory at a time and these are 
`thinned out' using the longest path and shortest path theorems contributed by Harry
Joe. 

See the updated help for -tabulate- for references.

As a note, the Fisher's exact computations for 2 x 2 tables are not affected by
this update.

-Rich
rgates@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index