Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: somersd resampling question

 From Roger Newson To "statalist@hsphsun2.harvard.edu" Subject Re: st: somersd resampling question Date Mon, 1 Nov 2010 11:33:03 +0000

Hi Al. As I understand it (correct me if I'm wrong), you have 2 multinomial lists of frequencies of an ordinal multinomial yariable for 2 groups of independent observations, and aim to measure ordinal correlation between membership of Group A (instead of Group B) and the ordinal variable. I will call the Group A membership indicator -groupa-, the ordinal variable -y-, and the cell frequency variable -cfreq-, and assume that you start with a dataset with 1 observation per table cell, sorted (and keyed uniquely) by -groupa- and -y-.
```
```
Normally, I would estimate Somers' D of -y- with respect to -groupa- by typing
```
somersd groupa y [fwei=cfreq], tdist transf(z)

```
which calculates a standard delta-jackknife asymmetric confidence interval, using the t-distribution and the Fisher z-transform. However, if you want to use the bootstrap or some other resampling method, then the -expgen- package, downloadable from SSC, can expand your dataset to have 1 observation per unit (whatever kind of unit -groupa- and -y- were measured on). As in:
```
expgen =cfreq, sortedby(group) copyseq(unit)

```
where -unit- is the sequence number of the unit within its cell. After -expgen- has run, the dataset in memory will have 1 observation per unit, and will be sorted (and keyed uniquely) by -groupa-, -y- and -unit-. You can then use the bootstrap, or any other resampling method. As in:
```
bootstrap, reps(1000): somersd groupa y

I hope this helps.

Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:

Opinions expressed are those of the author, not of the institution.

On 29/10/2010 20:41, Feiveson, Alan H. (JSC-SK311) wrote:
```
```Hi Roger, Thanks for the idea of setting up artificial clusters, but I don't see how this can be done with two multinomial lists. Anyway, for anyone who might be interested, I've done a small simulation with 23 categories (because that's what I have) and various combinations of sample sizes in each list. It turns out that the ratio of the empirical se to the somersd-calculated SE depends almost completely on the minimum of the two sample sizes and is closer to 1 when the minimum sample size is small.

Each row in the data below corresponds to 1000 simulated multinomial data sets with randomly generated independent cell probabilities - fixed over all 1000 data sets within a row, but varying from row to row.

Try plotting rat (= se_emp/se_calc) against nmin [= min(n1,n2)].

By the way, the purpose of all this is to come up with a quantifiable measure of how similar the distributions are with respect to their general patterns as opposed to actual values, such as might reflected by a chi-squared statistic.

Al Feiveson

n1    n2    se_calc     se_emp   nmin        rat   set
60    30   .1439008   .1321683     30   .9184684     1
120    30   .1519339   .1160367     30   .7637313     1
120    60   .1367752   .1034096     60   .7560548     1
240    30   .1501265    .120686     30   .8038954     1
240    60   .1672979   .0987834     60   .5904641     1
240   120   .1612942   .1094221    120   .6784011     1
480    30   .1448482    .121629     30   .8396998     1
480    60   .1544797   .1151996     60   .7457264     1
480   120    .157679   .1038079    120   .6583494     1
480   240   .1655068   .0882562    240   .5332483     1
960    30   .1471903   .1238696     30   .8415608     1
960    60   .1492855   .1071405     60   .7176883     1
960   120   .1490777   .1053668    120   .7067916     1
960   240    .144429   .0809639    240   .5605789     1
960   480   .1958908   .0645837    480   .3296922     1
60    30   .1457042   .1229061     30   .8435318     2
120    30   .1521924   .1159594     30   .7619262     2
120    60   .1486831   .1267989     60   .8528129     2
240    30   .1444352   .1168832     30   .8092432     2
240    60   .1460266   .1109937     60   .7600925     2
240   120   .1626369   .0910218    120   .5596629     2
480    30   .1431084    .127222     30   .8889909     2
480    60   .1533591     .10581     60   .6899495     2
480   120   .1673665   .0932405    120   .5571038     2
480   240   .1370986   .0833428    240   .6079037     2
960    30   .1434537   .1124708     30   .7840216     2
960    60   .1532602   .1213565     60   .7918329     2
960   120   .1626063   .0967448    120   .5949637     2
960   240   .1578968   .0861469    240   .5455902     2
960   480   .1544878   .0632528    480   .4094355     2

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roger Newson
Sent: Friday, October 29, 2010 12:01 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: somersd resampling question

Resampling is valid with -somersd-, as long as the units resampled are
clusters rather than non-independent observations within clusters. In
resampling method, then you will presumably have to expand the dataset
(using -expgen-, -reshape- or some similar command) to get the units to
be resampled.

I hope this helps.

Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:

Opinions expressed are those of the author, not of the institution.

On 29/10/2010 17:04, Feiveson, Alan H. (JSC-SK311) wrote:
```
```Hi - I want to use Kendall's Tau-a to characterize similarity between two multinomial samples. My question is whether the resampling in -somersd- to get standard errors is valid when comparing two multinomial samples, since technically the "obervations" (i.e. frequency counts) are not mutually independent. Anyone have an opinion on this?

Thanks

Al Feiveson

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```