Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: McNemar test for survey data

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: McNemar test for survey data
Date	Mon, 6 Jan 2014 11:16:10 -0500

You are welcome, Ankit. I should have clarified at the beginning that
the real hypothesis was about equality of the marginal proportions:
distributions.

i.e. p_i. = p_.i  i = 1,2

but the test statistics themselves operate only on the  off-diagonal
elements p_12 and p_21.

Steve

On Jan 5, 2014, at 11:08 PM, Ankit Sakhuja wrote:

Thanks Steve. That makes perfect sense.
Ankit

On Sun, Jan 5, 2014 at 9:31 PM, Steve Samuels <[email protected]> wrote:
> 
> "But as when everyone with positive result with test 2 are positive with
> test 1 shouldn't the comparison be rather between 0.1283 in p22 and the
> 0.1665 under total in 2X2 table (to the right on 0.1283) rather than p21
> and p12?"
> 
> That is exactly the "more useful formulation" that I mentioned in my
> post. The fact that p12 = 0 doesn't change that.
> 
> now p2. = p21 + p22 = .0382 + .1283 = .1665
>    p.2 = p12 + p22 =    0 +   1283 = .1283
> 
> 
> So the .1283 drops out, and
> p2. - p.2 =  p21 - p12 = .0382 .
> 
> Thus all inference of about p2. - p.2 is based on p21 -p12
> 
> 
> However this assumes that  p12 = 0 occurred by chance. If it would
> always happen because of how the tests are defined, then the null
> hypothesis for McNemar's test is false by definition, and there's no
> need for a test.
> 
> Note that the three lines starting with -set seed-, were in the
> example only the auto data set is ordered by foreign, and I wanted
> a second "test" that wouldn't be affected by this.
> 
> Steve
> [email protected]
> 
> 
> On Jan 5, 2014, at 9:54 PM, Ankit Sakhuja wrote:
> 
> Thanks Steve. Using your code I ran the program as below (after using svyset):
> 
> 
> svy: tab testresult1 testresult2
> 
> and I get this as below:
> 
> (running tabulate on estimation sample)
> 
> Number of strata   =        14                 Number of obs      =       4885
> Number of PSUs     =        31                 Population size    =  207888271
>                                              Design df          =         17
> 
> -------------------------------
> Testresult1 |  Testresult2
>            |
>            |     0      1  Total
> ----------+--------------------
>       0 | .8335      0  .8335
>       1 | .0382  .1283  .1665
>         |
>   Total | .8717  .1283      1
> -------------------------------
> Key:  cell proportions
> 
> Pearson:
>   Uncorrected   chi2(1)         = 3598.5695
>   Design-based  F(1, 17)        = 3475.6666     P = 0.0000
> 
> In this p12 has a value of 0 as below as there is no one with a
> positive result with test 2 but negative result with test 1 (everyone
> with positive result with test 2 are positive with test 1):
> . mat list e(b)
> 
> e(b)[1,4]
>         p11        p12        p21        p22
> y1  .83352672          0  .03821686  .12825641
> 
> and _b[p12] & _b[p21] is significantly different as below:
> . test _b[p12]=_b[p22]
> 
> Adjusted Wald test
> 
> ( 1)  p12 - p22 = 0
> 
>      F(  1,    17) =  333.13
>           Prob > F =    0.0000
> 
> 
> But as when everyone with positive result with test 2 are positive
> with test 1 shouldn't the comparison be rather between 0.1283 in p22
> and the 0.1665 under total in 2X2 table (to the right on 0.1283)
> rather than p21 and p12?
> Thanks
> Ankit
> 
> On Sun, Jan 5, 2014 at 8:39 PM, Steve Samuels <[email protected]> wrote:
>> Unfortunately "test1" inherited the foreign's value
>> labels.  Eliminated here.
>> 
>> SS
>> 
>> Here is the example in
>> (http://www.stata.com/statalist/archive/2010-03/msg00937.html),
>> specialized to your nomenclature. Roger's approach
>> requires an id variable for the reshape, but this does
>> not.
>> 
>> ******************CODE BEGINS***********
>> sysuse auto, clear
>> 
>> gen test1 = foreign
>> 
>> svyset _n [pw = turn], strata(rep78)
>> 
>> set seed 2000
>> gen u=uniform()
>> sort u
>> gen test2 = _n<39
>> svy: tab test1 test2
>> lincom _b[p12] - _b[p21]
>> *******************CODE ENDS*************
>> 
>> As I stated in the post, the hypothesis _b[p12] = _b[p21] is exactly
>> the hypothesis tested in McNemar's test. And, it is equivalent to
>> the more useful formulation that the proportions positive are
>> the same for test 1 and test 1.
>> 
>> Steve
>> [email protected]
>> 
>> 
>> 
>> On Jan 5, 2014, at 2:05 PM, Ankit Sakhuja wrote:
>> 
>> Thanks for the input. The survey sample that I am working on is a
>> stratified sample using probability weights. It is probability the
>> naivety and ignorance on my part but I am still not sure how to make
>> the variable -testid- as all observations underwent both tests. To
>> give an example my dataset looks like this:
>> 
>> Observation No     Result of Test 1      Result of Test 2
>> 1                                     1                          1
>> 2                                     1                          0
>> 3                                     1                          1
>> 4                                     1                          0
>> 5                                     1                          1
>> 6                                     1                          0
>> 7                                     1                          1
>> 8                                     0                          0
>> 9                                     1                          1
>> 10                                   0                          0
>> 
>> So that in the above example the result of test 1 is 80% and for test
>> 2 is 50% but all 10 observations got both tests.
>> 
>> Or a different example could be that 10 patients were given medication
>> A for asthma and after a washout period taking a medication B for the
>> same. Then say with first medication 80% had a response and with
>> second medication 50% had a response. So all observations got both
>> medications (or tests) and therefore I am not sure if variable
>> -testid- or -cat- (as in Samuel's example) can be made.
>> Thanks again
>> Ankit
>> 
>> On Sun, Jan 5, 2014 at 11:39 AM, Roger B. Newson
>> <[email protected]> wrote:
>>> This problem can probably be solved using -somersd-, -regpar-, -binreg-,
>>> -glm-, or some other package that can estimate diferences between 2
>>> proportions for clustered data. The first step would be to reshape your data
>>> (using either -reshape- or -expgen-) to have 1 observation per study subject
>>> per binary test (and therefore 2 observations per study subject as there are
>>> 2 binary tests). The binary outcome, in this dataset, would be the test
>>> result. For each study subject, it would be the outcome of the first binary
>>> test in the first observation for that subject, and the outcome of the
>>> second binary test in the second outcome. And the dataset would contain a
>>> variable, maybe called -testid-, with the value 1 in observations
>>> representing the first test, and 2 in observations representing the second
>>> test. The confidence interval to be calculated would be for the difference
>>> between 2 proportions, namely the proportion of positive outcomes where
>>> -testid- is 2 and the proportion o positive results where -testid- is 1.
>>> 
>>> You do not say what the sampling design is for your complex survey data.
>>> However, if this design has clusters, then they will be the clusters to use
>>> when estimating your difference between proportions. And, if this design
>>> does not have clusters, then the clusters used, when stimating your
>>> difference between proportions, will be the study subjects themselves.
>>> Either way, your final estimate will be clustered.
>>> 
>>> I hope thhis helps. Let me know if you have any further queries.
>>> 
>>> Best wishes
>>> 
>>> Roger
>>> 
>>> Roger B Newson BSc MSc DPhil
>>> Lecturer in Medical Statistics
>>> Respiratory Epidemiology, Occupational Medicine
>>> and Public Health Group
>>> National Heart and Lung Institute
>>> Imperial College London
>>> Royal Brompton Campus
>>> Room 33, Emmanuel Kaye Building
>>> 1B Manresa Road
>>> London SW3 6LR
>>> UNITED KINGDOM
>>> Tel: +44 (0)20 7352 8121 ext 3381
>>> Fax: +44 (0)20 7351 8322
>>> Email: [email protected]
>>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>>> Departmental Web page:
>>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>> 
>>> Opinions expressed are those of the author, not of the institution.
>>> 
>>> 
>>> On 05/01/2014 16:55, Ankit Sakhuja wrote:
>>>> 
>>>> Dear Members,
>>>> I am trying to compare two categorical variables which are not
>>>> mutually exclusive such that participants with a positive result in
>>>> one group (using method 1) also have a positive result in second group
>>>> (using method 2). Now say 30% have positive result by method 1 and 20%
>>>> by method two, how can I say that these results are in fact similar or
>>>> different? I could potentially use McNemar's but it is a complex
>>>> survey data and I am not sure how to go ahead with that. I have seen
>>>> discussions about using -somersd- but not sure how to exactly use it
>>>> with this data. Would really appreciate any help.
>>>> Ankit
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> 
>> --
>> Ankit
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> 
> --
> Ankit
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Ankit
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>
- Re: st: McNemar test for survey data
  - From: "Roger B. Newson" <[email protected]>
- Re: st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>
- Re: st: McNemar test for survey data
  - From: Steve Samuels <[email protected]>
- Re: st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>
- Re: st: McNemar test for survey data
  - From: Steve Samuels <[email protected]>
- Re: st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>

Prev by Date: Re: st: random forest algorithm in Stata?
Next by Date: Re: st: Census/Demographics Datasets
Previous by thread: Re: st: McNemar test for survey data
Next by thread: Re: st: McNemar test for survey data
Index(es):
- Date
- Thread