Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: transform data from spell format into ordinary panel data


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: transform data from spell format into ordinary panel data
Date   Fri, 16 Aug 2013 12:34:53 +0100

Paul is quite right that I overlooked the desire for a balanced panel.

But

1. If the data are unbalanced to start, -fillin- can't replace missing
(strictly, omitted and unknown) values in any other variables. So,
this approach can't convert unbalanced data to balanced. That requires
interpolation, extrapolation or imputation.

2. -fillin panelid spellid- would add in spurious spells. If person 42
experienced just 2 spells, bulking out to 9 spells, or whatever the
maximum was, makes no sense. I guess Paul meant something like -fillin
panelid time-, but point #1 still applies.

Nick
[email protected]


On 16 August 2013 10:32, Seed, Paul <[email protected]> wrote:
> Dear Statalist,
>
> Nick Cox is right as usual;  but Darjusch Tafreschi
> appears to also want a rectangular data set with
> entries also for the months of unemployment.
> One extra line is needed.
>
> ****************************
> ** End Stata code
> ****************************
>
> clear
> input panelid spellid t1 t2
> 1 1 502 503
> 1 2 504 604
> 2 1 502 555
> 2 2 556 600
> 2 3 601 604
> 3 1 550 553
> end
>
> gen nspell = t2 - t1 + 1
> expand nspell
> bysort panelid spellid : gen t = t1[1] + _n - 1
> fillin panelid spellid
>
> ****************************
> ** End Stata code
> ****************************
>
> Date: Thu, 15 Aug 2013 16:55:12 +0100
> From: Nick Cox <[email protected]>
> Subject: Re: st: transform data from spell format into ordinary panel data
>
> Stata is very, very good at these problems. Here is one way, and there
> may be user-written programs or official commands that are even
> quicker.
>
> . input panelid spellid t1 t2
>
>        panelid    spellid         t1         t2
>   1. 1 1 502 503
>   2. 1 2 504 604
>   3. 2 1 502 555
>   4. 2 2 556 600
>   5. 2 3 601 604
>   6. 3 1 550 553
>   7. end
>
> . gen nspell = t2 - t1 + 1
>
> . expand nspell
> (204 observations created)
>
> . bysort panelid spellid : gen t = t1[1] + _n - 1
>
> Nick
> [email protected]
>
> On 15 August 2013 16:36,  Darjusch Tafreschi <[email protected]> wrote:
>
>> the title pretty much describes my problem:
>>
>> I have a data set that contains persons and their employment episodes in the following format which I'm used to call "spell format " (not sure if thats a common expression (?). It is structured as follows:
>>
>>
>> Person-ID | Emploment-Episode-ID | start | end | Income | sector? | hrsperweek ...
>>
>> Any person can have multiple employment spells, each with start, end, income, hoursperweek worked and a bunch of more variables. Moreover, the durations of the employment states can vary across and within persons.
>>
>> The date is not in a typical day-month-year format,  but represented by a number that represents the time elapsed since 1970/01/01.
>>
>>
>> It looks like this then:
>>
>> 1 1 502 503 3.500 € public sector 42 hrsperweek
>> 1 2 504 604 3.900 € public sector 42 hrsperweek
>>
>> 2 1 502 555 2.200 € private sector 20 hrsperweek
>> 2 2 556 600 4.000 € private sector 42 hrsperweek
>> 2 3 601 604 4.500 € private sector 40 hrsperweek
>>
>> 3 1 550 553 1.500 € self-employed 60 hrspwerweek
>>
>>
>> I hope you can see that not necessarily the whole time period is covered, there can be gaps in which persons have been unemployed or studying or whatever.
>>
>> I would like to transform this data into something like a standard balanced panel dataset which gives me the state for every person in every month over the whole period (in this example the period 502-604). In particular it should look like this:
>>
>> Month | Person-ID | Emploment-Episode-ID | Income | sector? | hrsperweek ...
>>
>> In the end it shold be a HUGE data file looking like this:
>>
>> 502 1 1 ...
>> 502 2 1 ...
>> 502 3 -
>> 503 1 1 ...
>> 503 2 1 ...
>> 503 3 -
>> 504 1 2 ...
>> 504 2 1 ...
>> 504 3 -
>>
>> and so on.
>>
>>
>> I looked into statas survival capabilities, but am not sure if those are really helpful here.
>>
>> Can anyone tell me how to approach my problem??

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index