Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generating indicators

From   Jordan Hoolachan <>
Subject   Re: st: generating indicators
Date   Fri, 10 Sep 2010 16:19:54 -0400

Hi, Wassim

It sounds like your data is currently in long format, is that right?
There may be a way generate the indicators that you want with the data
being in long but I personally would first transpose it to wide.
Another question: do you have a list of all the different ICD9 codes
that appear in your data set?  If you do, ignore this next part of

If you don't have a list of all ICD9 codes that are present, do the following:

1. sort varB
2. by varB: gen code=_n
3. list varB if code==1

The above code will give you a list of the unique ICD9 codes that
appear in your'll need this for the next step.

Now, transpose your data from long to wide.  With the data in wide
format, you can use the egen function -rany- to produce the indicators
that you want (-findit egenmore- if you haven't already downloaded
it).  If you check out its help file, you'll see that it allows you to
specify a condition and then will indicate with a 0/1 if that
condition is met at least once over the a list of variables.  You can
implement a for loop to create all the indicators in one fell swoop.
The code would look something like this:

foreach x in <the list of ICD9 codes printed out before> {
egen `x'_indic=rany(<the variables containing the ICD9 codes>), cond(@==" `x' ")

The above code will produce an indicator of the form "ICDx_indic" for
each of the ICD codes that appear in your data set.

Hopefully that is clear enough..let me know if you have any questions.


Jordan Hoolachan
ScM Candidate
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health

On Fri, Sep 10, 2010 at 3:45 PM, Wassim Tarraf <> wrote:
> Dear Stata list members- I have a dataset that includes a variable A which
> is an identifier (nonconsecutive person id numbers) and a variable B which
> is a list of medical (icd9) conditions (string). Each person (identified by
> A) has as many records as reported conditions (conditions could be reported
> more than once). I would appreciate suggestions on an efficient way to
> generate conditions indicators (coded 0,1) that would account for whether a
> specific individual reported a certain condition or not.
> Thanks,
> Wassim
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index