Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: McNemar test for survey data

From	"Roger B. Newson" <[email protected]>
To	[email protected]
Subject	Re: st: McNemar test for survey data
Date	Mon, 06 Jan 2014 15:40:10 +0000

Sorry, I didn't note the existence of clusters of study subjects. AsSteve says, if these exist, then these should be the clusters, and notthe subjects themselves.


Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology, Occupational Medicine
and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 06/01/2014 15:24, Steve Samuels wrote:


The only disagreement I have with Roger's elegant approach is with the
following statement:

"You can then set this dataset up as a -svyset- dataset with -patid-
identifying the clusters."

The sample output that Ankit displayed in another post indicates that there
are 31 primary sampling units (PSUs). It is that PSU variable, not
-patid-, which should appear in the -svyset- statement.

Steve

On Jan 6, 2014, at 4:16 AM, Roger B. Newson wrote:

Yes, that is the case. And the confidence interval for the PAR or PUF gives you information on the size of the difference or ratio.

Besst wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology, Occupational Medicine
and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 05/01/2014 23:54, Ankit Sakhuja wrote:

Thanks so much for the help and sharing the presentation. One last
question regarding this. After using regpar and punaf if the p value
for PAR or PUF is <0.05, does that mean that the PAR, PUF and PAF are
significant and thus there is a significant difference between the two
test results?
Thanks
Ankit

On Sun, Jan 5, 2014 at 2:02 PM, Roger B. Newson <[email protected]> wrote:

The first step in the solution is probably to use -reshape long- (see online
help for -reshape-). If your test results are named -testres1- and
-testres2-, and your "Observation No" is a patient ID vriable -patid-, and
your stratum variable is -stratid-, and your sample-probability variable is
-samprob-, then you might type

reshape long testres, i(stratid patid samprob) j(testid)
lab var testid "Test ID"

and this will replace your dataset in memory with a long version, with a
variable -testid-. You can then set this dataset up as a -svyset- dataset,
with -patid- identifying the clusters, -stratid- identifying the strata, and
-samprob- as the sampling-probability weoghts. You can then use -logit-,
with the -svy:- prefix, with -testres- as the Y-variable and -testid- as
the predictive factor, to fit the model. Of course, not many people
understand odds or odds ratios. So the final step would be to use the SSC
package -regpar- to estimate the proportions positive under beach test,and
the differencee between the proportions, which are displayed as a confidence
interval. As in:

regpar, at(testid=1) atzero(testid=2)

More aboout -regpar- can be found in an articlee in the latest Stata Journal
(Newson, 2013), and in a presentation I gave at the 2012 UK Stata User
Meeting (Newson, 2012). It is designed to work after -svy:- commands, as it
is a wrapper for -margins-.

I hope this helps. Let me know if you have any further queries.

Best wishes

Roger

References

Newson RB. Attributable and unattributable risks and fractions and other
scenario comparisons. The Stata Journal 2013; 13(4): 672–698. Purchase from
http://www.stata-journal.com/article.html?article=st0314

Newson RB. Scenario comparisons: How much good can we do? Presented at the
18th UK Stata User Meeting, 13–14 September, 2012. Download from
http://ideas.repec.org/p/boc/usug12/01.html


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology, Occupational Medicine
and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 05/01/2014 19:05, Ankit Sakhuja wrote:


Thanks for the input. The survey sample that I am working on is a
stratified sample using probability weights. It is probability the
naivety and ignorance on my part but I am still not sure how to make
the variable -testid- as all observations underwent both tests. To
give an example my dataset looks like this:

Observation No     Result of Test 1      Result of Test 2
1                                     1                          1
2                                     1                          0
3                                     1                          1
4                                     1                          0
5                                     1                          1
6                                     1                          0
7                                     1                          1
8                                     0                          0
9                                     1                          1
10                                   0                          0

So that in the above example the result of test 1 is 80% and for test
2 is 50% but all 10 observations got both tests.

Or a different example could be that 10 patients were given medication
A for asthma and after a washout period taking a medication B for the
same. Then say with first medication 80% had a response and with
second medication 50% had a response. So all observations got both
medications (or tests) and therefore I am not sure if variable
-testid- or -cat- (as in Samuel's example) can be made.
Thanks again
Ankit

On Sun, Jan 5, 2014 at 11:39 AM, Roger B. Newson
<[email protected]> wrote:


This problem can probably be solved using -somersd-, -regpar-, -binreg-,
-glm-, or some other package that can estimate diferences between 2
proportions for clustered data. The first step would be to reshape your
data
(using either -reshape- or -expgen-) to have 1 observation per study
subject
per binary test (and therefore 2 observations per study subject as there
are
2 binary tests). The binary outcome, in this dataset, would be the test
result. For each study subject, it would be the outcome of the first
binary
test in the first observation for that subject, and the outcome of the
second binary test in the second outcome. And the dataset would contain a
variable, maybe called -testid-, with the value 1 in observations
representing the first test, and 2 in observations representing the
second
test. The confidence interval to be calculated would be for the
difference
between 2 proportions, namely the proportion of positive outcomes where
-testid- is 2 and the proportion o positive results where -testid- is 1.

You do not say what the sampling design is for your complex survey data.
However, if this design has clusters, then they will be the clusters to
use
when estimating your difference between proportions. And, if this design
does not have clusters, then the clusters used, when stimating your
difference between proportions, will be the study subjects themselves.
Either way, your final estimate will be clustered.

I hope thhis helps. Let me know if you have any further queries.

Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology, Occupational Medicine
and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:

http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.


On 05/01/2014 16:55, Ankit Sakhuja wrote:



Dear Members,
I am trying to compare two categorical variables which are not
mutually exclusive such that participants with a positive result in
one group (using method 1) also have a positive result in second group
(using method 2). Now say 30% have positive result by method 1 and 20%
by method two, how can I say that these results are in fact similar or
different? I could potentially use McNemar's but it is a complex
survey data and I am not sure how to go ahead with that. I have seen
discussions about using -somersd- but not sure how to exactly use it
with this data. Would really appreciate any help.
Ankit
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>
- Re: st: McNemar test for survey data
  - From: "Roger B. Newson" <[email protected]>
- Re: st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>
- Re: st: McNemar test for survey data
  - From: "Roger B. Newson" <[email protected]>
- Re: st: McNemar test for survey data
  - From: Ankit Sakhuja <[email protected]>
- Re: st: McNemar test for survey data
  - From: "Roger B. Newson" <[email protected]>
- Re: st: McNemar test for survey data
  - From: Steve Samuels <[email protected]>

Prev by Date: RE: st: Generating String variable containing `" & "'
Next by Date: Re: st: random forest algorithm in Stata?
Previous by thread: Re: st: McNemar test for survey data
Next by thread: Re: st: McNemar test for survey data
Index(es):
- Date
- Thread