Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: time-series data identified by three variables

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: time-series data identified by three variables
Date	Fri, 30 Nov 2012 11:24:44 +0000

It's best to think that you are addressing Statalist, and not any individual.

You should be able to work this out. If the answer is all zeros,
evidently there is one and only distinct value of -date- in each group
defined by -by:-. Indeed, it is likely that there is only and only one
observation in each group. I imagine that you want

bysort patient_id illness_id (date): gen duration = date - date[1]

Note that [_n] does no harm, but is unnecessary. The difference
implied by () is however crucial here.

On Fri, Nov 30, 2012 at 9:58 AM, YANNAN SHEN <[email protected]> wrote:

> There is one more thing I need your help with. Within each group where there is a patient return to treat the same disease, I want to calculate the duration between the repeat visit with his first visit .
> I wrote the following code:
>> bysort patient_id illness_id date: gen duration = date[_n]-date[1]
> but it returns all zeros.
> What is wrong?

On Nov 28, 2012, at 4:21 AM, Nick Cox <[email protected]> wrote:

>> You want commands like
>>
>> bysort patient_id illness_id date of visit : egen meansev = mean(severity)
>> by patient_id illness_id : gen repeat = _n - 1
>>
>> as you want to number 0 upwards.
>>
>>
>> Nick
>>
>> On Wed, Nov 28, 2012 at 6:28 AM, yannan shen <[email protected]> wrote:
>>
>>> I am working some panel data of hospital visits and I want to learn
>>> the severity of various disease.
>>> The variables I have in the dataset are: patient_id, illness_id,
>>> date_of_visit, severity
>>> each observation contains: patient_id, illness_id, date_of_visit, severity.
>>>
>>> For each patient (identified by patient_id), I want to know how many
>>> of times he has visited for the same illness （illness_id ).
>>> I use the duple command to to label the observation of patients who
>>> have visited hospital more than once.
>>>
>>>> duplicates tag  patient_id illness_id , generate(duple)
>>>
>>> However, duple does not give information for any time series
>>> information. If a patient has 5 visiting records, I want to be able to
>>> know which is the 0th repeat, 1st repeat, 2nd repeat, 3rd repeat, and
>>> 4th repeat...I have a vague feeling that I can order those variables
>>> via date_of_visit but I am still not sure how exactly that can be
>>> done.
>>>
>>> Furthermore, I want to create two new variables: one variable equals
>>> to the average severity of each disease (disease_id) being treated on
>>> the same date_of_visit. The other variable equals the highest severity
>>> of a certain disease being treated on that day. (Ideally, I want to
>>> create additional variables for each observation)
>>>
>>> I have used “bysort” in the past but since now the type is a
>>> combination of illness_id and date_of_visit， I am a little confused.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: time-series data identified by three variables
  - From: yannan shen <[email protected]>
- Re: st: time-series data identified by three variables
  - From: Nick Cox <[email protected]>
- Re: st: time-series data identified by three variables
  - From: YANNAN SHEN <[email protected]>

Prev by Date: st: Problems running spost9_ado on Stata 10 with _ms_omit_info
Next by Date: st: Extracting data from multiple tabs of an excel spreadsheet and appending in a single stata data file
Previous by thread: Re: st: time-series data identified by three variables
Next by thread: variable [names???] all in lower case [was: Re: st: Regarding Dates in String format]
Index(es):
- Date
- Thread