Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: time-series data identified by three variables

From   Nick Cox <>
Subject   Re: st: time-series data identified by three variables
Date   Fri, 30 Nov 2012 11:24:44 +0000

It's best to think that you are addressing Statalist, and not any individual.

You should be able to work this out. If the answer is all zeros,
evidently there is one and only distinct value of -date- in each group
defined by -by:-. Indeed, it is likely that there is only and only one
observation in each group. I imagine that you want

bysort patient_id illness_id (date): gen duration = date - date[1]

Note that [_n] does no harm, but is unnecessary. The difference
implied by () is however crucial here.

On Fri, Nov 30, 2012 at 9:58 AM, YANNAN SHEN <> wrote:

> There is one more thing I need your help with. Within each group where there is a patient return to treat the same disease, I want to calculate the duration between the repeat visit with his first visit .
> I wrote the following code:
>> bysort patient_id illness_id date: gen duration = date[_n]-date[1]
> but it returns all zeros.
> What is wrong?

On Nov 28, 2012, at 4:21 AM, Nick Cox <> wrote:

>> You want commands like
>> bysort patient_id illness_id date of visit : egen meansev = mean(severity)
>> by patient_id illness_id : gen repeat = _n - 1
>> as you want to number 0 upwards.
>> Nick
>> On Wed, Nov 28, 2012 at 6:28 AM, yannan shen <> wrote:
>>> I am working some panel data of hospital visits and I want to learn
>>> the severity of various disease.
>>> The variables I have in the dataset are: patient_id, illness_id,
>>> date_of_visit, severity
>>> each observation contains: patient_id, illness_id, date_of_visit, severity.
>>> For each patient (identified by patient_id), I want to know how many
>>> of times he has visited for the same illness (illness_id ).
>>> I use the duple command to to label the observation of patients who
>>> have visited hospital more than once.
>>>> duplicates tag  patient_id illness_id , generate(duple)
>>> However, duple does not give information for any time series
>>> information. If a patient has 5 visiting records, I want to be able to
>>> know which is the 0th repeat, 1st repeat, 2nd repeat, 3rd repeat, and
>>> 4th repeat...I have a vague feeling that I can order those variables
>>> via date_of_visit but I am still not sure how exactly that can be
>>> done.
>>> Furthermore, I want to create two new variables: one variable equals
>>> to the average severity of each disease (disease_id) being treated on
>>> the same date_of_visit. The other variable equals the highest severity
>>> of a certain disease being treated on that day. (Ideally, I want to
>>> create additional variables for each observation)
>>> I have used “bysort” in the past but since now the type is a
>>> combination of illness_id and date_of_visit, I am a little confused.

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index