Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: McNemar test for survey data


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: McNemar test for survey data
Date   Sun, 5 Jan 2014 22:31:58 -0500

"But as when everyone with positive result with test 2 are positive with
test 1 shouldn't the comparison be rather between 0.1283 in p22 and the
0.1665 under total in 2X2 table (to the right on 0.1283) rather than p21
and p12?"

That is exactly the "more useful formulation" that I mentioned in my
post. The fact that p12 = 0 doesn't change that.

now p2. = p21 + p22 = .0382 + .1283 = .1665
    p.2 = p12 + p22 =    0 +   1283 = .1283


So the .1283 drops out, and 
p2. - p.2 =  p21 - p12 = .0382 .

Thus all inference of about p2. - p.2 is based on p21 -p12


However this assumes that  p12 = 0 occurred by chance. If it would
always happen because of how the tests are defined, then the null
hypothesis for McNemar's test is false by definition, and there's no
need for a test.

Note that the three lines starting with -set seed-, were in the
example only the auto data set is ordered by foreign, and I wanted
a second "test" that wouldn't be affected by this.

Steve
[email protected]


On Jan 5, 2014, at 9:54 PM, Ankit Sakhuja wrote:

Thanks Steve. Using your code I ran the program as below (after using svyset):


svy: tab testresult1 testresult2

and I get this as below:

(running tabulate on estimation sample)

Number of strata   =        14                 Number of obs      =       4885
Number of PSUs     =        31                 Population size    =  207888271
                                              Design df          =         17

-------------------------------
Testresult1 |  Testresult2
            |
            |     0      1  Total
----------+--------------------
       0 | .8335      0  .8335
       1 | .0382  .1283  .1665
         |
   Total | .8717  .1283      1
-------------------------------
 Key:  cell proportions

 Pearson:
   Uncorrected   chi2(1)         = 3598.5695
   Design-based  F(1, 17)        = 3475.6666     P = 0.0000

In this p12 has a value of 0 as below as there is no one with a
positive result with test 2 but negative result with test 1 (everyone
with positive result with test 2 are positive with test 1):
. mat list e(b)

e(b)[1,4]
         p11        p12        p21        p22
y1  .83352672          0  .03821686  .12825641

and _b[p12] & _b[p21] is significantly different as below:
. test _b[p12]=_b[p22]

Adjusted Wald test

( 1)  p12 - p22 = 0

      F(  1,    17) =  333.13
           Prob > F =    0.0000


But as when everyone with positive result with test 2 are positive
with test 1 shouldn't the comparison be rather between 0.1283 in p22
and the 0.1665 under total in 2X2 table (to the right on 0.1283)
rather than p21 and p12?
Thanks
Ankit

On Sun, Jan 5, 2014 at 8:39 PM, Steve Samuels <[email protected]> wrote:
> Unfortunately "test1" inherited the foreign's value
> labels.  Eliminated here.
> 
> SS
> 
> Here is the example in
> (http://www.stata.com/statalist/archive/2010-03/msg00937.html),
> specialized to your nomenclature. Roger's approach
> requires an id variable for the reshape, but this does
> not.
> 
> ******************CODE BEGINS***********
> sysuse auto, clear
> 
> gen test1 = foreign
> 
> svyset _n [pw = turn], strata(rep78)
> 
> set seed 2000
> gen u=uniform()
> sort u
> gen test2 = _n<39
> svy: tab test1 test2
> lincom _b[p12] - _b[p21]
> *******************CODE ENDS*************
> 
> As I stated in the post, the hypothesis _b[p12] = _b[p21] is exactly
> the hypothesis tested in McNemar's test. And, it is equivalent to
> the more useful formulation that the proportions positive are
> the same for test 1 and test 1.
> 
> Steve
> [email protected]
> 
> 
> 
> On Jan 5, 2014, at 2:05 PM, Ankit Sakhuja wrote:
> 
> Thanks for the input. The survey sample that I am working on is a
> stratified sample using probability weights. It is probability the
> naivety and ignorance on my part but I am still not sure how to make
> the variable -testid- as all observations underwent both tests. To
> give an example my dataset looks like this:
> 
> Observation No     Result of Test 1      Result of Test 2
> 1                                     1                          1
> 2                                     1                          0
> 3                                     1                          1
> 4                                     1                          0
> 5                                     1                          1
> 6                                     1                          0
> 7                                     1                          1
> 8                                     0                          0
> 9                                     1                          1
> 10                                   0                          0
> 
> So that in the above example the result of test 1 is 80% and for test
> 2 is 50% but all 10 observations got both tests.
> 
> Or a different example could be that 10 patients were given medication
> A for asthma and after a washout period taking a medication B for the
> same. Then say with first medication 80% had a response and with
> second medication 50% had a response. So all observations got both
> medications (or tests) and therefore I am not sure if variable
> -testid- or -cat- (as in Samuel's example) can be made.
> Thanks again
> Ankit
> 
> On Sun, Jan 5, 2014 at 11:39 AM, Roger B. Newson
> <[email protected]> wrote:
>> This problem can probably be solved using -somersd-, -regpar-, -binreg-,
>> -glm-, or some other package that can estimate diferences between 2
>> proportions for clustered data. The first step would be to reshape your data
>> (using either -reshape- or -expgen-) to have 1 observation per study subject
>> per binary test (and therefore 2 observations per study subject as there are
>> 2 binary tests). The binary outcome, in this dataset, would be the test
>> result. For each study subject, it would be the outcome of the first binary
>> test in the first observation for that subject, and the outcome of the
>> second binary test in the second outcome. And the dataset would contain a
>> variable, maybe called -testid-, with the value 1 in observations
>> representing the first test, and 2 in observations representing the second
>> test. The confidence interval to be calculated would be for the difference
>> between 2 proportions, namely the proportion of positive outcomes where
>> -testid- is 2 and the proportion o positive results where -testid- is 1.
>> 
>> You do not say what the sampling design is for your complex survey data.
>> However, if this design has clusters, then they will be the clusters to use
>> when estimating your difference between proportions. And, if this design
>> does not have clusters, then the clusters used, when stimating your
>> difference between proportions, will be the study subjects themselves.
>> Either way, your final estimate will be clustered.
>> 
>> I hope thhis helps. Let me know if you have any further queries.
>> 
>> Best wishes
>> 
>> Roger
>> 
>> Roger B Newson BSc MSc DPhil
>> Lecturer in Medical Statistics
>> Respiratory Epidemiology, Occupational Medicine
>> and Public Health Group
>> National Heart and Lung Institute
>> Imperial College London
>> Royal Brompton Campus
>> Room 33, Emmanuel Kaye Building
>> 1B Manresa Road
>> London SW3 6LR
>> UNITED KINGDOM
>> Tel: +44 (0)20 7352 8121 ext 3381
>> Fax: +44 (0)20 7351 8322
>> Email: [email protected]
>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>> Departmental Web page:
>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>> 
>> Opinions expressed are those of the author, not of the institution.
>> 
>> 
>> On 05/01/2014 16:55, Ankit Sakhuja wrote:
>>> 
>>> Dear Members,
>>> I am trying to compare two categorical variables which are not
>>> mutually exclusive such that participants with a positive result in
>>> one group (using method 1) also have a positive result in second group
>>> (using method 2). Now say 30% have positive result by method 1 and 20%
>>> by method two, how can I say that these results are in fact similar or
>>> different? I could potentially use McNemar's but it is a complex
>>> survey data and I am not sure how to go ahead with that. I have seen
>>> discussions about using -somersd- but not sure how to exactly use it
>>> with this data. Would really appreciate any help.
>>> Ankit
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> 
> --
> Ankit
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Ankit
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index