Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: st: assigning values from a list


From   Kit Baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   re: st: assigning values from a list
Date   Sat, 22 May 2010 14:06:51 -0400

<>
Marietherese said:

Sorry I should clarify what var1 and var4 are:

When a patient presents to the clinic, they can be diagnosed with at least one
but up to 4 diseases, entered into var1-4 as codes from the Diagnostic and
Statistical Manual version 9.0. Var1 will always have a value but the remaining
var2-4 may or may not be missing, depending on the diagnosis. So if I want to
find patients suffering from a group of viral illnesses, I would have to search
all 4 variables for the codes 53.20, 54.43 etc up to 76.90

I need to search all var1-4 for every one those codes to make sure I don't miss
any cases. But there are many cases where at least one of var2-4 are missing. 
In that case it might be better to step through through the variables one by
one

I also tried doing this using a
 
local varlist var1 var2 var3 var4
gen virus=.
foreach var in varlist {
 if virus !=. {
   replace virus=1 if ((`var'==53.20) |(`var'==54.42) |(`var'==54.43) | ///
   (`var'==76.00) |(`var'==76.90))
 }
} 

command, but I wasn't aware of the command inlist when I wrote that. 


Similarly if I want to find other groups of diseases (eg fungal infections) I
need to search var1-4 for a different list of DSM codes.

There are about 20 groups of diseases that I need to identify. To complicate
things patients can have multiple diagnoses so I need to make a judgement call
about which one is more serious - var1 takes precedence.



Here is a solution that will look for matches, and return a set of indicator variables where matches are found:

------------------------
clear all
set obs 100000
forv i=1/4 {
// make up fake data with some missing values
 g var`i' = int(10000*runiform())
 replace var`i' = cond(var`i'< 500, ., var`i')
// create a set of mvar1,2,3,4 variables, set to missing
 g mvar`i' = .
// create lists of var1..var4 and mvar1..mvar4
 loc vl "`vl' var`i'"
 loc rl "`rl' mvar`i'"
}
// put the diagnoses to be matched in the matrix match
// number of elements does not matter
mat match = (5320, 5442, 5443, 7600, 7690)

mata:
void matchlist(string scalar varlist, string scalar retlist)
{
	st_view(X=., ., varlist)
	st_view(Z=., ., retlist)
	match = st_matrix("match")
	for(j=1; j<=cols(X); j++) {
		for(k=1; k<=cols(match); k++) {
			Z[.,j] = (X[.,j] :== match[k])
		}
	}
}
end
mata: matchlist("`vl'", "`rl'")
mat l match
egen anydiag = rowtotal(mvar*)
l var1 mvar1 mvar2 mvar3 mvar4 if anydiag>0, sep(0)
----------------------------

To use it, you merely need to create four variables mvar1..mvar4 (with all missing values) and place the diagnoses in a Stata matrix. You could look for different diagnoses by just placing different contents in that matrix. Any number of elements will work.

The variable anydiag flags whether a patient has any of the diagnoses, and variables mvar1-mvar4 indicate which. In my fake data there are not any patients who have more than one diagnosis, but the code should handle that gracefully.

Kit Baum   |   Boston College Economics & DIW Berlin   |   http://ideas.repec.org/e/pba1.html
                              An Introduction to Stata Programming  |   http://www.stata-press.com/books/isp.html
   An Introduction to Modern Econometrics Using Stata  |   http://www.stata-press.com/books/imeus.html


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index