Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: somersd resampling question

From	Roger Newson <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: somersd resampling question
Date	Mon, 1 Nov 2010 11:33:03 +0000

Hi Al. As I understand it (correct me if I'm wrong), you have 2multinomial lists of frequencies of an ordinal multinomial yariable for2 groups of independent observations, and aim to measure ordinalcorrelation between membership of Group A (instead of Group B) and theordinal variable. I will call the Group A membership indicator -groupa-,the ordinal variable -y-, and the cell frequency variable -cfreq-, andassume that you start with a dataset with 1 observation per table cell,sorted (and keyed uniquely) by -groupa- and -y-.

Normally, I would estimate Somers' D of -y- with respect to -groupa- bytyping


somersd groupa y [fwei=cfreq], tdist transf(z)

which calculates a standard delta-jackknife asymmetric confidenceinterval, using the t-distribution and the Fisher z-transform. However,if you want to use the bootstrap or some other resampling method, thenthe -expgen- package, downloadable from SSC, can expand your dataset tohave 1 observation per unit (whatever kind of unit -groupa- and -y- weremeasured on). As in:


expgen =cfreq, sortedby(group) copyseq(unit)

where -unit- is the sequence number of the unit within its cell. After-expgen- has run, the dataset in memory will have 1 observation perunit, and will be sorted (and keyed uniquely) by -groupa-, -y- and-unit-. You can then use the bootstrap, or any other resampling method.As in:


bootstrap, reps(1000): somersd groupa y

I hope this helps.

Best wishes

Roger


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 29/10/2010 20:41, Feiveson, Alan H. (JSC-SK311) wrote:

Hi Roger, Thanks for the idea of setting up artificial clusters, but I don't see how this can be done with two multinomial lists. Anyway, for anyone who might be interested, I've done a small simulation with 23 categories (because that's what I have) and various combinations of sample sizes in each list. It turns out that the ratio of the empirical se to the somersd-calculated SE depends almost completely on the minimum of the two sample sizes and is closer to 1 when the minimum sample size is small.

Each row in the data below corresponds to 1000 simulated multinomial data sets with randomly generated independent cell probabilities - fixed over all 1000 data sets within a row, but varying from row to row.

Try plotting rat (= se_emp/se_calc) against nmin [= min(n1,n2)].

By the way, the purpose of all this is to come up with a quantifiable measure of how similar the distributions are with respect to their general patterns as opposed to actual values, such as might reflected by a chi-squared statistic.



Al Feiveson


      n1    n2    se_calc     se_emp   nmin        rat   set
      60    30   .1439008   .1321683     30   .9184684     1
     120    30   .1519339   .1160367     30   .7637313     1
     120    60   .1367752   .1034096     60   .7560548     1
     240    30   .1501265    .120686     30   .8038954     1
     240    60   .1672979   .0987834     60   .5904641     1
     240   120   .1612942   .1094221    120   .6784011     1
     480    30   .1448482    .121629     30   .8396998     1
     480    60   .1544797   .1151996     60   .7457264     1
     480   120    .157679   .1038079    120   .6583494     1
     480   240   .1655068   .0882562    240   .5332483     1
     960    30   .1471903   .1238696     30   .8415608     1
     960    60   .1492855   .1071405     60   .7176883     1
     960   120   .1490777   .1053668    120   .7067916     1
     960   240    .144429   .0809639    240   .5605789     1
     960   480   .1958908   .0645837    480   .3296922     1
      60    30   .1457042   .1229061     30   .8435318     2
     120    30   .1521924   .1159594     30   .7619262     2
     120    60   .1486831   .1267989     60   .8528129     2
     240    30   .1444352   .1168832     30   .8092432     2
     240    60   .1460266   .1109937     60   .7600925     2
     240   120   .1626369   .0910218    120   .5596629     2
     480    30   .1431084    .127222     30   .8889909     2
     480    60   .1533591     .10581     60   .6899495     2
     480   120   .1673665   .0932405    120   .5571038     2
     480   240   .1370986   .0833428    240   .6079037     2
     960    30   .1434537   .1124708     30   .7840216     2
     960    60   .1532602   .1213565     60   .7918329     2
     960   120   .1626063   .0967448    120   .5949637     2
     960   240   .1578968   .0861469    240   .5455902     2
     960   480   .1544878   .0632528    480   .4094355     2








-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Roger Newson
Sent: Friday, October 29, 2010 12:01 PM
To: [email protected]
Subject: Re: st: somersd resampling question

Resampling is valid with -somersd-, as long as the units resampled are
clusters rather than non-independent observations within clusters. In
your case, if you start with frequency counts and want to use a
resampling method, then you will presumably have to expand the dataset
(using -expgen-, -reshape- or some similar command) to get the units to
be resampled.

I hope this helps.

Best wishes

Roger


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 29/10/2010 17:04, Feiveson, Alan H. (JSC-SK311) wrote:

Hi - I want to use Kendall's Tau-a to characterize similarity between two multinomial samples. My question is whether the resampling in -somersd- to get standard errors is valid when comparing two multinomial samples, since technically the "obervations" (i.e. frequency counts) are not mutually independent. Anyone have an opinion on this?

Thanks

Al Feiveson

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: somersd resampling question
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>

Prev by Date: st: Mata --- polyroots() accuracy
Next by Date: RE: st: RE: Difficulties in variable calculation using panel data
Previous by thread: st: Mata --- polyroots() accuracy
Next by thread: RE: st: somersd resampling question
Index(es):
- Date
- Thread