Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generating indicators


From   Wassim Tarraf <tarrafwassim@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: generating indicators
Date   Fri, 10 Sep 2010 16:48:06 -0400

Hi Jordan- Yes the data is in long format. I was actually trying to avoid reshaping the data.

Thanks,
Wassim



Jordan Hoolachan wrote:
Hi, Wassim

It sounds like your data is currently in long format, is that right?
There may be a way generate the indicators that you want with the data
being in long but I personally would first transpose it to wide.
Another question: do you have a list of all the different ICD9 codes
that appear in your data set?  If you do, ignore this next part of
code.

If you don't have a list of all ICD9 codes that are present, do the following:

1. sort varB
2. by varB: gen code=_n
3. list varB if code==1

The above code will give you a list of the unique ICD9 codes that
appear in your dataset...you'll need this for the next step.

Now, transpose your data from long to wide.  With the data in wide
format, you can use the egen function -rany- to produce the indicators
that you want (-findit egenmore- if you haven't already downloaded
it).  If you check out its help file, you'll see that it allows you to
specify a condition and then will indicate with a 0/1 if that
condition is met at least once over the a list of variables.  You can
implement a for loop to create all the indicators in one fell swoop.
The code would look something like this:

foreach x in <the list of ICD9 codes printed out before> {
egen `x'_indic=rany(<the variables containing the ICD9 codes>), cond(@==" `x' ")
}


The above code will produce an indicator of the form "ICDx_indic" for
each of the ICD codes that appear in your data set.

Hopefully that is clear enough..let me know if you have any questions.

Jordan




Jordan Hoolachan
ScM Candidate
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
410-294-3670



On Fri, Sep 10, 2010 at 3:45 PM, Wassim Tarraf <tarrafwassim@gmail.com> wrote:
Dear Stata list members- I have a dataset that includes a variable A which
is an identifier (nonconsecutive person id numbers) and a variable B which
is a list of medical (icd9) conditions (string). Each person (identified by
A) has as many records as reported conditions (conditions could be reported
more than once). I would appreciate suggestions on an efficient way to
generate conditions indicators (coded 0,1) that would account for whether a
specific individual reported a certain condition or not.

Thanks,
Wassim
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index