Tom Trikalinos <[email protected]> |

[email protected] |

Re: st: reshaping a data file: cell frequency = number of rows in the new data set |

Fri, 17 Dec 2004 11:46:13 +0200 |

Hi Gerben:

You would have to use _reshape long_ repeatedly and then _expand_.

As far as I get it, you have data on n studies in the following format

study TP FP FN TN

1 1 2 3 4

2 14 33 52 10

...

n A B C D

where the variables represent the 4 cells of the diagnostic table (TP=true positive, FP=false positive, FN=false negative, TN=true negative)

You could _reshape long_ in 2 steps:

First step: each study will occupy 2 lines, one for the positive and one for the negative counts per gold standard:

. ren TP posTest1

. ren FP posTest0

. ren FN negTest1

. ren TN negTest0

reshape long posTest negTest, i(study) j(goldStandardIs)

Second step: The same strategy. Each line (--> gold standard status per study) breaks in 2 lines, yielding 4 lines per study. First you have to uniquely identify the lines

. egen lineID = group(study goldStandardIs)

. ren posTest counts1

. ren negTest counts0

reshape long counts , i(lineID) j(testIs)

Now, each cell of the table is in a separate line. Coding is based on the dummies 'testIS' and 'goldStandardIs', as per your request; the variable 'counts' has each cell's counts.

Third step: expand the dataset and drop useless variables

expand counts

drop lineID counts

Hope this helps

tom

On Dec 17, 2004, at 2:12 AM, G. ter Riet wrote:

Dear List members, In the context of a diagnostic meta-analysis, I have data from 2x2 tables (cells a thru' d, signifying true positives, false positives, etc). My file has a wide format and is perfectly suited for a meta- analytic command like <metan a b c d, or>. As a hypothetical example, suppose one of my studies (rows) contained the cell frequencies 1 2 3 4, for variables a, b, c, d, respectively. How could I efficiently create another file (long format), based on these data, that looked like 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 where the 1st column signified a binary test result, and the 2nd column a binary outcome according to a reference standard (gold standard test). Any help would be much appreciated. Gerben ter Riet, Amsterdam * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

