[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
David Kantor <dkantor@jhu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Newbie: Case selection problem |

Date |
Mon, 07 Feb 2005 09:14:35 -0500 |

This is a different problem. How many possible combinations of k potential diagnoses are there?

The answer is 2^k, and there is a natural (one-to-one) mapping from these combinations to the integers from 0 to 2^k -1. But if k is large, then how do you name all the combinations? You may be stuck with just using the resulting integer.

First, you need to have your diag values to be in a small range of non-negative integers, such as 0-k (with minimal gaps in this range). If they already are in such a form, okay. Else, you need to map them (one-to-one) into such a set of integers. (If your diag values are string, you can -encode- them and use the encoded values.)

Next, suppose that diag is that variable (or a one derived by an appropriate mapping).

summ diag

local diagmax = r(max) // get the maximal value (corresponds to k in the above)

assert r(min) >=0 // we really don't want any negatives

sort personid

forvalues n = 0 / `diagmax' {

egen byte hasdiag`n' = max(diag==`n'), by(personid)

}

/* That is like what I wrote in the previous reply -- but compacted under a -forvalues-. */

/* At this point you can condense to one observation per person; this is optional. */

bysort personid: keep if _n==1

/* Generate the identifier of all combinations. */

gen long combination = 0

forvalues n = 0 / `diagmax' {

replace combination = combination + 2^`n' if hasdiag`n'

}

----

There may be other (better) ways to express that computation.

Also, be warned, I have not tested this.

And, if `diagmax' is large, you may need double rather than long -- for the type of combination.

If this has been done correctly, then each value of combination should uniquely correspond to a distinct combination of diag values. The correspondence is that for each diag value of n, that diag value is present if and only if there is a 1 in the nth bit of the binary representation of combination (counting from the right, starting with 0) -- but only when represented as an integer (not float or double).

----

Again, I hope this helps.

At 05:43 PM 2/5/2005 +0200, you wrote:

Thanks, it solved the problem at hand.

But as you quite correctly assumed the next step I have to do is to identify all diagnose combinations that are there.

Suggestions?

Taavi

David Kantor wrote:

To Taavi Lai:

There are several way of doing this. Here's one.

Say you have a variable called diag for diagnosis, and suppose that you are looking for persons that have the combination of values 2,3 & 7. (Those may not be realistic diag values, but they will suffice for now.) Also suppose that you have personid to identify persons.

sort personid

egen byte hasdiag2 = max(diag==2), by(personid)

egen byte hasdiag3 = max(diag==3), by(personid)

egen byte hasdiag7 = max(diag==7), by(personid)

/* (And you could compact that sequence with a -foreach- if you prefer.) */

gen byte hascombination = hasdiag2 & hasdiag3 & hasdiag7

/* You could also ...

bysort personid: keep if _n==1

-- if you want to reduce to one observation per person.

(And it could go before the -gen byte hascombination-.)

*/

----

This detects that the person has at least the particular combination; it does not detect whether these are the ONLY diagnoses. You were not clear whether that was part of the problem. If it is, then

more needs to be done.

I hope this is helpful.

-- David

At 11:42 PM 2/4/2005 +0200, Taavi Lai wrote:

Could someone point me a way:

I have a list of people and their diagnoses. One person can have varying number of observations according to the number of diagnoses.

I need to identify people with specific combination of diagnoses.

Taavi Lai

-- Tervishoiu instituut Tartu Ülikool Ravila 19 Tartu 50411 GSM: 56663859 Fax: 7374192 e-mail: taavi.lai@ut.ee

David Kantor Institute for Policy Studies Johns Hopkins University dkantor@jhu.edu 410-516-5404 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Newbie: Case selection problem***From:*David Kantor <dkantor@jhu.edu>

**Re: st: Newbie: Case selection problem***From:*Taavi Lai <taavi.lai@ut.ee>

- Prev by Date:
**st: speeding up program that uses mkmat (thanks to Nick and Kit!)** - Next by Date:
**st: Sample size and QREG** - Previous by thread:
**Re: st: Newbie: Case selection problem** - Next by thread:
**RE: st: Newbie: Case selection problem** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |