Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: reshaping a data file: cell frequency = number of rows in the new data set


From   Tom Trikalinos <[email protected]>
To   [email protected]
Subject   Re: st: reshaping a data file: cell frequency = number of rows in the new data set
Date   Fri, 17 Dec 2004 11:46:13 +0200

Hi Gerben:
You would have to use _reshape long_ repeatedly and then _expand_.
As far as I get it, you have data on n studies in the following format

study TP FP FN TN
1 1 2 3 4
2 14 33 52 10
...
n A B C D

where the variables represent the 4 cells of the diagnostic table (TP=true positive, FP=false positive, FN=false negative, TN=true negative)
You could _reshape long_ in 2 steps:

First step: each study will occupy 2 lines, one for the positive and one for the negative counts per gold standard:
. ren TP posTest1
. ren FP posTest0
. ren FN negTest1
. ren TN negTest0
reshape long posTest negTest, i(study) j(goldStandardIs)

Second step: The same strategy. Each line (--> gold standard status per study) breaks in 2 lines, yielding 4 lines per study. First you have to uniquely identify the lines
. egen lineID = group(study goldStandardIs)
. ren posTest counts1
. ren negTest counts0
reshape long counts , i(lineID) j(testIs)

Now, each cell of the table is in a separate line. Coding is based on the dummies 'testIS' and 'goldStandardIs', as per your request; the variable 'counts' has each cell's counts.

Third step: expand the dataset and drop useless variables
expand counts
drop lineID counts

Hope this helps

tom



On Dec 17, 2004, at 2:12 AM, G. ter Riet wrote:


Dear List members,
In the context of a diagnostic meta-analysis, I have data from 2x2
tables (cells a thru' d, signifying true positives, false positives,
etc). My file has a wide format and is perfectly suited for a meta-
analytic command like <metan a b c d, or>.
As a hypothetical example, suppose one of my studies (rows) contained
the cell frequencies 1 2 3 4, for variables a, b, c, d, respectively.
How could I efficiently create another file (long format), based on
these data, that looked like
1 1
1 0
1 0
0 1
0 1
0 1
0 0
0 0
0 0
0 0
where the 1st column signified a binary test result, and the 2nd
column a binary outcome according to a reference standard (gold
standard test).
Any help would be much appreciated.

Gerben ter Riet, Amsterdam

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index