Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: somersd resampling question

From	"Feiveson, Alan H. (JSC-SK311)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: somersd resampling question
Date	Fri, 29 Oct 2010 14:41:50 -0500

Hi Roger, Thanks for the idea of setting up artificial clusters, but I don't see how this can be done with two multinomial lists. Anyway, for anyone who might be interested, I've done a small simulation with 23 categories (because that's what I have) and various combinations of sample sizes in each list. It turns out that the ratio of the empirical se to the somersd-calculated SE depends almost completely on the minimum of the two sample sizes and is closer to 1 when the minimum sample size is small.

Each row in the data below corresponds to 1000 simulated multinomial data sets with randomly generated independent cell probabilities - fixed over all 1000 data sets within a row, but varying from row to row. 

Try plotting rat (= se_emp/se_calc) against nmin [= min(n1,n2)].

By the way, the purpose of all this is to come up with a quantifiable measure of how similar the distributions are with respect to their general patterns as opposed to actual values, such as might reflected by a chi-squared statistic.



Al Feiveson


     n1    n2    se_calc     se_emp   nmin        rat   set  
     60    30   .1439008   .1321683     30   .9184684     1  
    120    30   .1519339   .1160367     30   .7637313     1  
    120    60   .1367752   .1034096     60   .7560548     1  
    240    30   .1501265    .120686     30   .8038954     1  
    240    60   .1672979   .0987834     60   .5904641     1  
    240   120   .1612942   .1094221    120   .6784011     1  
    480    30   .1448482    .121629     30   .8396998     1  
    480    60   .1544797   .1151996     60   .7457264     1  
    480   120    .157679   .1038079    120   .6583494     1  
    480   240   .1655068   .0882562    240   .5332483     1  
    960    30   .1471903   .1238696     30   .8415608     1  
    960    60   .1492855   .1071405     60   .7176883     1  
    960   120   .1490777   .1053668    120   .7067916     1  
    960   240    .144429   .0809639    240   .5605789     1  
    960   480   .1958908   .0645837    480   .3296922     1  
     60    30   .1457042   .1229061     30   .8435318     2  
    120    30   .1521924   .1159594     30   .7619262     2  
    120    60   .1486831   .1267989     60   .8528129     2  
    240    30   .1444352   .1168832     30   .8092432     2  
    240    60   .1460266   .1109937     60   .7600925     2  
    240   120   .1626369   .0910218    120   .5596629     2  
    480    30   .1431084    .127222     30   .8889909     2  
    480    60   .1533591     .10581     60   .6899495     2  
    480   120   .1673665   .0932405    120   .5571038     2  
    480   240   .1370986   .0833428    240   .6079037     2  
    960    30   .1434537   .1124708     30   .7840216     2  
    960    60   .1532602   .1213565     60   .7918329     2  
    960   120   .1626063   .0967448    120   .5949637     2  
    960   240   .1578968   .0861469    240   .5455902     2  
    960   480   .1544878   .0632528    480   .4094355     2  







 
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Roger Newson
Sent: Friday, October 29, 2010 12:01 PM
To: [email protected]
Subject: Re: st: somersd resampling question

Resampling is valid with -somersd-, as long as the units resampled are 
clusters rather than non-independent observations within clusters. In 
your case, if you start with frequency counts and want to use a 
resampling method, then you will presumably have to expand the dataset 
(using -expgen-, -reshape- or some similar command) to get the units to 
be resampled.

I hope this helps.

Best wishes

Roger


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 29/10/2010 17:04, Feiveson, Alan H. (JSC-SK311) wrote:
> Hi - I want to use Kendall's Tau-a to characterize similarity between two multinomial samples. My question is whether the resampling in -somersd- to get standard errors is valid when comparing two multinomial samples, since technically the "obervations" (i.e. frequency counts) are not mutually independent. Anyone have an opinion on this?
>
> Thanks
>
> Al Feiveson
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: somersd resampling question
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>
- Re: st: somersd resampling question
  - From: Roger Newson <[email protected]>

Prev by Date: RE: st: MI IMPUTE MVN
Next by Date: st: comparing two linear slopes
Previous by thread: Re: st: somersd resampling question
Next by thread: st: metan: reversing the Events columns to the right of the forest plot
Index(es):
- Date
- Thread