Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: somersd resampling question |

Date |
Fri, 29 Oct 2010 14:41:50 -0500 |

Hi Roger, Thanks for the idea of setting up artificial clusters, but I don't see how this can be done with two multinomial lists. Anyway, for anyone who might be interested, I've done a small simulation with 23 categories (because that's what I have) and various combinations of sample sizes in each list. It turns out that the ratio of the empirical se to the somersd-calculated SE depends almost completely on the minimum of the two sample sizes and is closer to 1 when the minimum sample size is small. Each row in the data below corresponds to 1000 simulated multinomial data sets with randomly generated independent cell probabilities - fixed over all 1000 data sets within a row, but varying from row to row. Try plotting rat (= se_emp/se_calc) against nmin [= min(n1,n2)]. By the way, the purpose of all this is to come up with a quantifiable measure of how similar the distributions are with respect to their general patterns as opposed to actual values, such as might reflected by a chi-squared statistic. Al Feiveson n1 n2 se_calc se_emp nmin rat set 60 30 .1439008 .1321683 30 .9184684 1 120 30 .1519339 .1160367 30 .7637313 1 120 60 .1367752 .1034096 60 .7560548 1 240 30 .1501265 .120686 30 .8038954 1 240 60 .1672979 .0987834 60 .5904641 1 240 120 .1612942 .1094221 120 .6784011 1 480 30 .1448482 .121629 30 .8396998 1 480 60 .1544797 .1151996 60 .7457264 1 480 120 .157679 .1038079 120 .6583494 1 480 240 .1655068 .0882562 240 .5332483 1 960 30 .1471903 .1238696 30 .8415608 1 960 60 .1492855 .1071405 60 .7176883 1 960 120 .1490777 .1053668 120 .7067916 1 960 240 .144429 .0809639 240 .5605789 1 960 480 .1958908 .0645837 480 .3296922 1 60 30 .1457042 .1229061 30 .8435318 2 120 30 .1521924 .1159594 30 .7619262 2 120 60 .1486831 .1267989 60 .8528129 2 240 30 .1444352 .1168832 30 .8092432 2 240 60 .1460266 .1109937 60 .7600925 2 240 120 .1626369 .0910218 120 .5596629 2 480 30 .1431084 .127222 30 .8889909 2 480 60 .1533591 .10581 60 .6899495 2 480 120 .1673665 .0932405 120 .5571038 2 480 240 .1370986 .0833428 240 .6079037 2 960 30 .1434537 .1124708 30 .7840216 2 960 60 .1532602 .1213565 60 .7918329 2 960 120 .1626063 .0967448 120 .5949637 2 960 240 .1578968 .0861469 240 .5455902 2 960 480 .1544878 .0632528 480 .4094355 2 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roger Newson Sent: Friday, October 29, 2010 12:01 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: somersd resampling question Resampling is valid with -somersd-, as long as the units resampled are clusters rather than non-independent observations within clusters. In your case, if you start with frequency counts and want to use a resampling method, then you will presumably have to expand the dataset (using -expgen-, -reshape- or some similar command) to get the units to be resampled. I hope this helps. Best wishes Roger Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: r.newson@imperial.ac.uk Web page: http://www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ Opinions expressed are those of the author, not of the institution. On 29/10/2010 17:04, Feiveson, Alan H. (JSC-SK311) wrote: > Hi - I want to use Kendall's Tau-a to characterize similarity between two multinomial samples. My question is whether the resampling in -somersd- to get standard errors is valid when comparing two multinomial samples, since technically the "obervations" (i.e. frequency counts) are not mutually independent. Anyone have an opinion on this? > > Thanks > > Al Feiveson > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: somersd resampling question***From:*"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>

**Re: st: somersd resampling question***From:*Roger Newson <r.newson@imperial.ac.uk>

- Prev by Date:
**RE: st: MI IMPUTE MVN** - Next by Date:
**st: comparing two linear slopes** - Previous by thread:
**Re: st: somersd resampling question** - Next by thread:
**st: metan: reversing the Events columns to the right of the forest plot** - Index(es):