Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: McNemar test for survey data
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: McNemar test for survey data
Date
Mon, 6 Jan 2014 11:16:10 -0500
You are welcome, Ankit. I should have clarified at the beginning that
the real hypothesis was about equality of the marginal proportions:
distributions.
i.e. p_i. = p_.i i = 1,2
but the test statistics themselves operate only on the off-diagonal
elements p_12 and p_21.
Steve
On Jan 5, 2014, at 11:08 PM, Ankit Sakhuja wrote:
Thanks Steve. That makes perfect sense.
Ankit
On Sun, Jan 5, 2014 at 9:31 PM, Steve Samuels <[email protected]> wrote:
>
> "But as when everyone with positive result with test 2 are positive with
> test 1 shouldn't the comparison be rather between 0.1283 in p22 and the
> 0.1665 under total in 2X2 table (to the right on 0.1283) rather than p21
> and p12?"
>
> That is exactly the "more useful formulation" that I mentioned in my
> post. The fact that p12 = 0 doesn't change that.
>
> now p2. = p21 + p22 = .0382 + .1283 = .1665
> p.2 = p12 + p22 = 0 + 1283 = .1283
>
>
> So the .1283 drops out, and
> p2. - p.2 = p21 - p12 = .0382 .
>
> Thus all inference of about p2. - p.2 is based on p21 -p12
>
>
> However this assumes that p12 = 0 occurred by chance. If it would
> always happen because of how the tests are defined, then the null
> hypothesis for McNemar's test is false by definition, and there's no
> need for a test.
>
> Note that the three lines starting with -set seed-, were in the
> example only the auto data set is ordered by foreign, and I wanted
> a second "test" that wouldn't be affected by this.
>
> Steve
> [email protected]
>
>
> On Jan 5, 2014, at 9:54 PM, Ankit Sakhuja wrote:
>
> Thanks Steve. Using your code I ran the program as below (after using svyset):
>
>
> svy: tab testresult1 testresult2
>
> and I get this as below:
>
> (running tabulate on estimation sample)
>
> Number of strata = 14 Number of obs = 4885
> Number of PSUs = 31 Population size = 207888271
> Design df = 17
>
> -------------------------------
> Testresult1 | Testresult2
> |
> | 0 1 Total
> ----------+--------------------
> 0 | .8335 0 .8335
> 1 | .0382 .1283 .1665
> |
> Total | .8717 .1283 1
> -------------------------------
> Key: cell proportions
>
> Pearson:
> Uncorrected chi2(1) = 3598.5695
> Design-based F(1, 17) = 3475.6666 P = 0.0000
>
> In this p12 has a value of 0 as below as there is no one with a
> positive result with test 2 but negative result with test 1 (everyone
> with positive result with test 2 are positive with test 1):
> . mat list e(b)
>
> e(b)[1,4]
> p11 p12 p21 p22
> y1 .83352672 0 .03821686 .12825641
>
> and _b[p12] & _b[p21] is significantly different as below:
> . test _b[p12]=_b[p22]
>
> Adjusted Wald test
>
> ( 1) p12 - p22 = 0
>
> F( 1, 17) = 333.13
> Prob > F = 0.0000
>
>
> But as when everyone with positive result with test 2 are positive
> with test 1 shouldn't the comparison be rather between 0.1283 in p22
> and the 0.1665 under total in 2X2 table (to the right on 0.1283)
> rather than p21 and p12?
> Thanks
> Ankit
>
> On Sun, Jan 5, 2014 at 8:39 PM, Steve Samuels <[email protected]> wrote:
>> Unfortunately "test1" inherited the foreign's value
>> labels. Eliminated here.
>>
>> SS
>>
>> Here is the example in
>> (http://www.stata.com/statalist/archive/2010-03/msg00937.html),
>> specialized to your nomenclature. Roger's approach
>> requires an id variable for the reshape, but this does
>> not.
>>
>> ******************CODE BEGINS***********
>> sysuse auto, clear
>>
>> gen test1 = foreign
>>
>> svyset _n [pw = turn], strata(rep78)
>>
>> set seed 2000
>> gen u=uniform()
>> sort u
>> gen test2 = _n<39
>> svy: tab test1 test2
>> lincom _b[p12] - _b[p21]
>> *******************CODE ENDS*************
>>
>> As I stated in the post, the hypothesis _b[p12] = _b[p21] is exactly
>> the hypothesis tested in McNemar's test. And, it is equivalent to
>> the more useful formulation that the proportions positive are
>> the same for test 1 and test 1.
>>
>> Steve
>> [email protected]
>>
>>
>>
>> On Jan 5, 2014, at 2:05 PM, Ankit Sakhuja wrote:
>>
>> Thanks for the input. The survey sample that I am working on is a
>> stratified sample using probability weights. It is probability the
>> naivety and ignorance on my part but I am still not sure how to make
>> the variable -testid- as all observations underwent both tests. To
>> give an example my dataset looks like this:
>>
>> Observation No Result of Test 1 Result of Test 2
>> 1 1 1
>> 2 1 0
>> 3 1 1
>> 4 1 0
>> 5 1 1
>> 6 1 0
>> 7 1 1
>> 8 0 0
>> 9 1 1
>> 10 0 0
>>
>> So that in the above example the result of test 1 is 80% and for test
>> 2 is 50% but all 10 observations got both tests.
>>
>> Or a different example could be that 10 patients were given medication
>> A for asthma and after a washout period taking a medication B for the
>> same. Then say with first medication 80% had a response and with
>> second medication 50% had a response. So all observations got both
>> medications (or tests) and therefore I am not sure if variable
>> -testid- or -cat- (as in Samuel's example) can be made.
>> Thanks again
>> Ankit
>>
>> On Sun, Jan 5, 2014 at 11:39 AM, Roger B. Newson
>> <[email protected]> wrote:
>>> This problem can probably be solved using -somersd-, -regpar-, -binreg-,
>>> -glm-, or some other package that can estimate diferences between 2
>>> proportions for clustered data. The first step would be to reshape your data
>>> (using either -reshape- or -expgen-) to have 1 observation per study subject
>>> per binary test (and therefore 2 observations per study subject as there are
>>> 2 binary tests). The binary outcome, in this dataset, would be the test
>>> result. For each study subject, it would be the outcome of the first binary
>>> test in the first observation for that subject, and the outcome of the
>>> second binary test in the second outcome. And the dataset would contain a
>>> variable, maybe called -testid-, with the value 1 in observations
>>> representing the first test, and 2 in observations representing the second
>>> test. The confidence interval to be calculated would be for the difference
>>> between 2 proportions, namely the proportion of positive outcomes where
>>> -testid- is 2 and the proportion o positive results where -testid- is 1.
>>>
>>> You do not say what the sampling design is for your complex survey data.
>>> However, if this design has clusters, then they will be the clusters to use
>>> when estimating your difference between proportions. And, if this design
>>> does not have clusters, then the clusters used, when stimating your
>>> difference between proportions, will be the study subjects themselves.
>>> Either way, your final estimate will be clustered.
>>>
>>> I hope thhis helps. Let me know if you have any further queries.
>>>
>>> Best wishes
>>>
>>> Roger
>>>
>>> Roger B Newson BSc MSc DPhil
>>> Lecturer in Medical Statistics
>>> Respiratory Epidemiology, Occupational Medicine
>>> and Public Health Group
>>> National Heart and Lung Institute
>>> Imperial College London
>>> Royal Brompton Campus
>>> Room 33, Emmanuel Kaye Building
>>> 1B Manresa Road
>>> London SW3 6LR
>>> UNITED KINGDOM
>>> Tel: +44 (0)20 7352 8121 ext 3381
>>> Fax: +44 (0)20 7351 8322
>>> Email: [email protected]
>>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>>> Departmental Web page:
>>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>>
>>> Opinions expressed are those of the author, not of the institution.
>>>
>>>
>>> On 05/01/2014 16:55, Ankit Sakhuja wrote:
>>>>
>>>> Dear Members,
>>>> I am trying to compare two categorical variables which are not
>>>> mutually exclusive such that participants with a positive result in
>>>> one group (using method 1) also have a positive result in second group
>>>> (using method 2). Now say 30% have positive result by method 1 and 20%
>>>> by method two, how can I say that these results are in fact similar or
>>>> different? I could potentially use McNemar's but it is a complex
>>>> survey data and I am not sure how to go ahead with that. I have seen
>>>> discussions about using -somersd- but not sure how to exactly use it
>>>> with this data. Would really appreciate any help.
>>>> Ankit
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> Ankit
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Ankit
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Ankit
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/