[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
rgates@stata.com (Richard Gates) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Slow STATA |

Date |
Tue, 12 Oct 2004 09:50:46 -0500 |

In response to Clive's attempt to run -tabulate, exact- on a table with astronomical frequencies: > Fair enough, but running a larger table on the -union- data still poses no > problem on my machine (save the (minor?) error registered at the end): > > . webuse union > > . tab year union, exact all > > interview | 1 if union > year | 0 1 | Total > -----------+----------------------+---------- > 70 | 1,310 348 | 1,658 > 71 | 1,412 381 | 1,793 > 72 | 1,576 382 | 1,958 > 73 | 1,633 424 | 2,057 > 77 | 2,180 490 | 2,670 > 78 | 1,530 463 | 1,993 > 80 | 1,432 596 | 2,028 > 82 | 1,846 562 | 2,408 > 83 | 1,733 461 | 2,194 > 85 | 1,888 600 | 2,488 > 87 | 1,981 555 | 2,536 > 88 | 1,868 549 | 2,417 > -----------+----------------------+---------- > Total | 20,389 5,811 | 26,200 > > Pearson chi2(11) = 107.8144 Pr = 0.000 > likelihood-ratio chi2(11) = 105.0420 Pr = 0.000 > Cramér's V = 0.0641 > gamma = 0.0342 ASE = 0.009 > Kendall's tau-b = 0.0192 ASE = 0.005 > integer overflow due to large row margin frequencies > r(1401); > This is an example of a table not to request Fisher's exact probablities for. The fexact algorithm uses a network for computing the probabilities. This network is comprised of levels of nodes, one level for each column in the table. A table with the same row totals as the input table is represented by a path through these nodes. The netork requires a key to identify each node which, in essence, is a function of row totals of subtables. The function computing the key for your example overflows the integer key. We actually use two integers to store the key to allow larger tables, but it is likely infeasible to compute Fisher's exact probability using the fexact algorithm for these larger tables anyway. For those that know some counting rules, be assured we do not generate the entire network at once. Only two levels of nodes are in memory at a time and these are `thinned out' using the longest path and shortest path theorems contributed by Harry Joe. See the updated help for -tabulate- for references. As a note, the Fisher's exact computations for 2 x 2 tables are not affected by this update. -Rich rgates@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Can Stata estimate Mixed Fixed and Random (MFR) model?** - Next by Date:
**st: RE: using Stata's micombine command on a SAS-imputed dataset** - Previous by thread:
**Re: st: Slow STATA** - Next by thread:
**st: Re: Slow STATA** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |