Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: time-series data identified by three variables

From   yannan shen <>
Subject   st: time-series data identified by three variables
Date   Wed, 28 Nov 2012 01:28:10 -0500

Dear statalist,
I am working some panel data of hospital visits and I want to learn
the severity of various disease.
The variables I have in the dataset are: patient_id, illness_id,
date_of_visit, severity
each observation contains: patient_id, illness_id, date_of_visit, severity.

For each patient (identified by patient_id), I want to know how many
of times he has visited for the same illness (illness_id ).
I use the duple command to to label the observation of patients who
have visited hospital more than once.

> duplicates tag  patient_id illness_id , generate(duple)

However, duple does not give information for any time series
information. If a patient has 5 visiting records, I want to be able to
know which is the 0th repeat, 1st repeat, 2nd repeat, 3rd repeat, and
4th repeat...I have a vague feeling that I can order those variables
via date_of_visit but I am still not sure how exactly that can be

Furthermore, I want to create two new variables: one variable equals
to the average severity of each disease (disease_id) being treated on
the same date_of_visit. The other variable equals the highest severity
of a certain disease being treated on that day. (Ideally, I want to
create additional variables for each observation)

I have used “bysort” in the past but since now the type is a
combination of illness_id and date_of_visit, I am a little confused.
I appreciate your help and suggestions.



*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index