Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: somersd resampling question |

Date |
Mon, 1 Nov 2010 08:31:26 -0500 |

Hi Roger - I do have the two lists as you have outlined below, but my objective is not to do statistical inference on whether group A is different from group B, but instead to use Kendall's Tau-a with confidence limits as a measure of similarity based on rank ordering (instead of the actual frequency values), treating the frequencies as observations. So in the example below, A and B are my two multinomial lists +---------------+ | y A B | |---------------| | 1 0 0 | | 2 1 1 | | 3 17 20 | | 4 2 8 | | 5 1 3 | |---------------| | 6 1 10 | | 7 2 1 | | 8 1 14 | | 9 3 7 | | 10 2 3 | |---------------| | 11 4 4 | | 12 4 6 | | 13 0 4 | | 14 1 1 | | 15 0 4 | |---------------| | 16 1 5 | | 17 3 1 | | 18 2 2 | | 19 3 2 | | 20 14 7 | |---------------| | 21 3 2 | | 22 1 4 | | 23 0 1 | +---------------+ and I want to use Tau-a as an index of similarity between A and B, with appropriate confidence limits, taking into consideration that being a multinomial list, the "observations" in A or B i.e. frequencies, are not independent because they sum to a fixed total. So if I do something like somersd Z B,transf(z) taua the SE and hence the confidence limits I get will not be correct. Al the confidence limits I Al -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roger Newson Sent: Monday, November 01, 2010 6:33 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: somersd resampling question Hi Al. As I understand it (correct me if I'm wrong), you have 2 multinomial lists of frequencies of an ordinal multinomial yariable for 2 groups of independent observations, and aim to measure ordinal correlation between membership of Group A (instead of Group B) and the ordinal variable. I will call the Group A membership indicator -groupa-, the ordinal variable -y-, and the cell frequency variable -cfreq-, and assume that you start with a dataset with 1 observation per table cell, sorted (and keyed uniquely) by -groupa- and -y-. Normally, I would estimate Somers' D of -y- with respect to -groupa- by typing somersd groupa y [fwei=cfreq], tdist transf(z) which calculates a standard delta-jackknife asymmetric confidence interval, using the t-distribution and the Fisher z-transform. However, if you want to use the bootstrap or some other resampling method, then the -expgen- package, downloadable from SSC, can expand your dataset to have 1 observation per unit (whatever kind of unit -groupa- and -y- were measured on). As in: expgen =cfreq, sortedby(group) copyseq(unit) where -unit- is the sequence number of the unit within its cell. After -expgen- has run, the dataset in memory will have 1 observation per unit, and will be sorted (and keyed uniquely) by -groupa-, -y- and -unit-. You can then use the bootstrap, or any other resampling method. As in: bootstrap, reps(1000): somersd groupa y I hope this helps. Best wishes Roger Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: r.newson@imperial.ac.uk Web page: http://www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ Opinions expressed are those of the author, not of the institution. On 29/10/2010 20:41, Feiveson, Alan H. (JSC-SK311) wrote: > Hi Roger, Thanks for the idea of setting up artificial clusters, but I don't see how this can be done with two multinomial lists. Anyway, for anyone who might be interested, I've done a small simulation with 23 categories (because that's what I have) and various combinations of sample sizes in each list. It turns out that the ratio of the empirical se to the somersd-calculated SE depends almost completely on the minimum of the two sample sizes and is closer to 1 when the minimum sample size is small. > > Each row in the data below corresponds to 1000 simulated multinomial data sets with randomly generated independent cell probabilities - fixed over all 1000 data sets within a row, but varying from row to row. > > Try plotting rat (= se_emp/se_calc) against nmin [= min(n1,n2)]. > > By the way, the purpose of all this is to come up with a quantifiable measure of how similar the distributions are with respect to their general patterns as opposed to actual values, such as might reflected by a chi-squared statistic. > > > > Al Feiveson > > > n1 n2 se_calc se_emp nmin rat set > 60 30 .1439008 .1321683 30 .9184684 1 > 120 30 .1519339 .1160367 30 .7637313 1 > 120 60 .1367752 .1034096 60 .7560548 1 > 240 30 .1501265 .120686 30 .8038954 1 > 240 60 .1672979 .0987834 60 .5904641 1 > 240 120 .1612942 .1094221 120 .6784011 1 > 480 30 .1448482 .121629 30 .8396998 1 > 480 60 .1544797 .1151996 60 .7457264 1 > 480 120 .157679 .1038079 120 .6583494 1 > 480 240 .1655068 .0882562 240 .5332483 1 > 960 30 .1471903 .1238696 30 .8415608 1 > 960 60 .1492855 .1071405 60 .7176883 1 > 960 120 .1490777 .1053668 120 .7067916 1 > 960 240 .144429 .0809639 240 .5605789 1 > 960 480 .1958908 .0645837 480 .3296922 1 > 60 30 .1457042 .1229061 30 .8435318 2 > 120 30 .1521924 .1159594 30 .7619262 2 > 120 60 .1486831 .1267989 60 .8528129 2 > 240 30 .1444352 .1168832 30 .8092432 2 > 240 60 .1460266 .1109937 60 .7600925 2 > 240 120 .1626369 .0910218 120 .5596629 2 > 480 30 .1431084 .127222 30 .8889909 2 > 480 60 .1533591 .10581 60 .6899495 2 > 480 120 .1673665 .0932405 120 .5571038 2 > 480 240 .1370986 .0833428 240 .6079037 2 > 960 30 .1434537 .1124708 30 .7840216 2 > 960 60 .1532602 .1213565 60 .7918329 2 > 960 120 .1626063 .0967448 120 .5949637 2 > 960 240 .1578968 .0861469 240 .5455902 2 > 960 480 .1544878 .0632528 480 .4094355 2 > > > > > > > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roger Newson > Sent: Friday, October 29, 2010 12:01 PM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: somersd resampling question > > Resampling is valid with -somersd-, as long as the units resampled are > clusters rather than non-independent observations within clusters. In > your case, if you start with frequency counts and want to use a > resampling method, then you will presumably have to expand the dataset > (using -expgen-, -reshape- or some similar command) to get the units to > be resampled. > > I hope this helps. > > Best wishes > > Roger > > > Roger B Newson BSc MSc DPhil > Lecturer in Medical Statistics > Respiratory Epidemiology and Public Health Group > National Heart and Lung Institute > Imperial College London > Royal Brompton Campus > Room 33, Emmanuel Kaye Building > 1B Manresa Road > London SW3 6LR > UNITED KINGDOM > Tel: +44 (0)20 7352 8121 ext 3381 > Fax: +44 (0)20 7351 8322 > Email: r.newson@imperial.ac.uk > Web page: http://www.imperial.ac.uk/nhli/r.newson/ > Departmental Web page: > http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ > > Opinions expressed are those of the author, not of the institution. > > On 29/10/2010 17:04, Feiveson, Alan H. (JSC-SK311) wrote: >> Hi - I want to use Kendall's Tau-a to characterize similarity between two multinomial samples. My question is whether the resampling in -somersd- to get standard errors is valid when comparing two multinomial samples, since technically the "obervations" (i.e. frequency counts) are not mutually independent. Anyone have an opinion on this? >> >> Thanks >> >> Al Feiveson >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: somersd resampling question***From:*Roger Newson <r.newson@imperial.ac.uk>

- Prev by Date:
**st: re: loops - beginner's question** - Next by Date:
**st: xtreg fe - using specific types of w/i group variation (HELP PLEASE)** - Previous by thread:
**Re: st: somersd resampling question** - Next by thread:
**RE: st: RE: Difficulties in variable calculation using panel data** - Index(es):