Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Analysis of event history data


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Analysis of event history data
Date   Tue, 20 Mar 2012 14:14:15 +0000

Never say "one final question"!

-help egen- shows that there are -egen- functions -anycount()-,
-anymatch()-. -anyvalue()-. So

egen ones = anycount(y_*), values(1)
keep if ones

Even if those functions did not exist, you could do this

gen ones = 0

quietly foreach v of var y_* {
      replace ones = ones + (`v' == 1)
}

keep if ones

Nick

On Tue, Mar 20, 2012 at 1:28 PM, Kristian Thor Jakobsen <[email protected]> wrote:

> Thanks again, Nick. I figured it out with your help. But I have one final question. Given that my dataset consists of several million observations, I would like to trim the dataset down before I do the -reshape- command in order to avoid wasting time on observations that I would subsequently throw out. Say that I want to keep those observations where y_* is equal to 1 in one or more cases:
>
>  Id      y_1001  y_1002  y_1003 ...     y_1101  area_10  area_11
>  1       1       1       0              1       10      5
>
> I guess I could do the following:
>
> keep if y_1001==1| y_1002==1 etc.
>
> But given that I have around 1000 variables or so where I would need to check for the sufficient condition that would be a quite tedious function. Is there a smart way to get around this?

Nick Cox

> Do spend some time studying the resources for -reshape- including FAQs.
>
> First off, your -y_- cannot be an identifier! It doesn't identify observations.
>
> Second off, you can include -area- in the -reshape- but I guess you will need some extra surgery before and after. I would try a -rename- of the -area*- such as
>
> foreach v of var area* {
> rename `v' `v'01
> }
>
> and then there will be some fill-in afterwards.
>
> Nick
>
> On Mon, Mar 19, 2012 at 12:30 PM, Kristian Thor Jakobsen <[email protected]> wrote:
>> Thanks, Nick. -reshape- is a big help. But what if I have time-varying variables that I would like to carry over as well, but not with same intervals. For example:
>>
>> Id      y_1001  y_1002  y_1003 ...      y_1101  area_10
>> area_11
>> 1       1       1       0       0       10      5
>>
>> If I do -reshape using y_ as the identifier I would get something like:
>>
>> Id      j       y_      area_10 area_11
>> 1       1001    1       10      5
>> 1       1002    1       10      5
>> 1       1003    0       10      5
>> .
>> .
>> .1      1101    0       10      5
>>
>> But I would like to have something like:
>>
>> Id      j       y_      area
>> 1       1001    1       10
>> 1       1002    1       10
>> 1       1003    0       10
>> .
>> .
>> .
>> 1       1101    0       5
>>
>> Is that possible with -reshape-? Or would I have to convert the yearly time-varying variables into weekly first?
>>
>> Thanks again,
>> Kristian
>>
>> -----Oprindelig meddelelse-----
>> Fra: [email protected]
>> [mailto:[email protected]] På vegne af Nick Cox
>> Sendt: 19. marts 2012 12:43
>> Til: [email protected]
>> Emne: Re: st: Analysis of event history data
>>
>> For most Stata purposes your data would indeed be better reshaped to a long data structure or shape or form (some people do say "format", but in a Stata context format implies -format-, etc.).
>>
>> reshape long y_ , i(id) j(time)
>> rename y_ status
>>
>> should do it. See also -tsspell- (SSC) and
>>
>> SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata:
>> Identifying spells
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
>> J. Cox
>>        Q2/07   SJ 7(2):249--265                                 (no
>> commands)
>>        shows how to handle spells with complete control over
>>        spell specification
>>
>> as well as the literature on survival analysis with which you are evidently familiar.
>>
>> Nick
>>
>> On Mon, Mar 19, 2012 at 11:32 AM, Kristian Thor Jakobsen <[email protected]> wrote:
>>
>>> I am trying to do an analysis of transition in and out of public
>>> income transfers. My data is organized roughly the following way:
>>>
>>> Id      y_1001  y_1002  y_1003
>>> 1       0       1       0
>>> 2       0       0       0
>>> 3       1       1       0
>>>
>>> This means that I have the weekly status of each individual from 1991
>>> to 2011. But in order to any sort of analysis I would guess that I
>>> had to convert the data into the following way instead (for example
>>> survival
>>> analysis):
>>>
>>> Id      Status  Time
>>> 1       0       1
>>> 1       1       2
>>> 1       0       3
>>> 2       0       1
>>> 2       0       2
>>> 2       0       3
>>> 3       1       1
>>> 3       1       2
>>> 3       0       3
>>>
>>> Is that correct, and if so, does there exist a smart way to convert
>>> the data from one format into the other? Or can I perhaps use the
>>> data as given?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index