Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: st: Analysis of event history data


From   "Kristian Thor Jakobsen" <KRJ@dm.dk>
To   <statalist@hsphsun2.harvard.edu>
Subject   SV: st: Analysis of event history data
Date   Tue, 20 Mar 2012 14:28:42 +0100

Thanks again, Nick. I figured it out with your help. But I have one final question. Given that my dataset consists of several million observations, I would like to trim the dataset down before I do the -reshape- command in order to avoid wasting time on observations that I would subsequently throw out. Say that I want to keep those observations where y_* is equal to 1 in one or more cases:

 Id      y_1001  y_1002  y_1003 ...  	y_1101  area_10  area_11
 1       1       1       0       	1       10      5

I guess I could do the following:

keep if y_1001==1| y_1002==1 etc.

But given that I have around 1000 variables or so where I would need to check for the sufficient condition that would be a quite tedious function. Is there a smart way to get around this?

Thanks again,
Kristian

-----Oprindelig meddelelse-----
Fra: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af Nick Cox
Sendt: 19. marts 2012 13:46
Til: statalist@hsphsun2.harvard.edu
Emne: Re: st: Analysis of event history data

Do spend some time studying the resources for -reshape- including FAQs.

First off, your -y_- cannot be an identifier! It doesn't identify observations.

Second off, you can include -area- in the -reshape- but I guess you will need some extra surgery before and after. I would try a -rename- of the -area*- such as

foreach v of var area* {
rename `v' `v'01
}

and then there will be some fill-in afterwards.

Nick

On Mon, Mar 19, 2012 at 12:30 PM, Kristian Thor Jakobsen <KRJ@dm.dk> wrote:
> Thanks, Nick. -reshape- is a big help. But what if I have time-varying variables that I would like to carry over as well, but not with same intervals. For example:
>
> Id      y_1001  y_1002  y_1003 ...      y_1101  area_10         
> area_11
> 1       1       1       0       0       10      5
>
> If I do -reshape using y_ as the identifier I would get something like:
>
> Id      j       y_      area_10 area_11
> 1       1001    1       10      5
> 1       1002    1       10      5
> 1       1003    0       10      5
> .
> .
> .1      1101    0       10      5
>
> But I would like to have something like:
>
> Id      j       y_      area
> 1       1001    1       10
> 1       1002    1       10
> 1       1003    0       10
> .
> .
> .
> 1       1101    0       5
>
> Is that possible with -reshape-? Or would I have to convert the yearly time-varying variables into weekly first?
>
> Thanks again,
> Kristian
>
> -----Oprindelig meddelelse-----
> Fra: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af Nick Cox
> Sendt: 19. marts 2012 12:43
> Til: statalist@hsphsun2.harvard.edu
> Emne: Re: st: Analysis of event history data
>
> For most Stata purposes your data would indeed be better reshaped to a long data structure or shape or form (some people do say "format", but in a Stata context format implies -format-, etc.).
>
> reshape long y_ , i(id) j(time)
> rename y_ status
>
> should do it. See also -tsspell- (SSC) and
>
> SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: 
> Identifying spells
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. 
> J. Cox
>        Q2/07   SJ 7(2):249--265                                 (no 
> commands)
>        shows how to handle spells with complete control over
>        spell specification
>
> as well as the literature on survival analysis with which you are evidently familiar.
>
> Nick
>
> On Mon, Mar 19, 2012 at 11:32 AM, Kristian Thor Jakobsen <KRJ@dm.dk> wrote:
>
>> I am trying to do an analysis of transition in and out of public 
>> income transfers. My data is organized roughly the following way:
>>
>> Id      y_1001  y_1002  y_1003
>> 1       0       1       0
>> 2       0       0       0
>> 3       1       1       0
>>
>> This means that I have the weekly status of each individual from 1991 
>> to 2011. But in order to any sort of analysis I would guess that I 
>> had to convert the data into the following way instead (for example 
>> survival
>> analysis):
>>
>> Id      Status  Time
>> 1       0       1
>> 1       1       2
>> 1       0       3
>> 2       0       1
>> 2       0       2
>> 2       0       3
>> 3       1       1
>> 3       1       2
>> 3       0       3
>>
>> Is that correct, and if so, does there exist a smart way to convert 
>> the data from one format into the other? Or can I perhaps use the 
>> data as given?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index