Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: transform data from spell format into ordinary panel data


From   DTA <[email protected]>
To   [email protected]
Subject   Re: st: transform data from spell format into ordinary panel data
Date   Thu, 15 Aug 2013 23:10:11 +0200

Thanks Nick,

thats indeed a very simple work aroundand seems to solve my question. Would be interesting to know more about the user-written or official commands that you mention.

Best,
Darjusch


-------- Original Message --------
Subject: Re: st: transform data from spell format into ordinary panel data
From: Nick Cox <[email protected]>
To: [email protected] <[email protected]>
Date: Thu Aug 15 2013 17:55:12 GMT+0200
Stata is very, very good at these problems. Here is one way, and there
may be user-written programs or official commands that are even
quicker.

. input panelid spellid t1 t2

        panelid    spellid         t1         t2
   1. 1 1 502 503
   2. 1 2 504 604
   3. 2 1 502 555
   4. 2 2 556 600
   5. 2 3 601 604
   6. 3 1 550 553
   7. end

. gen nspell = t2 - t1 + 1

. expand nspell
(204 observations created)

. bysort panelid spellid : gen t = t1[1] + _n - 1

Nick
[email protected]

On 15 August 2013 16:36,  Darjusch Tafreschi <[email protected]> wrote:

the title pretty much describes my problem:

I have a data set that contains persons and their employment episodes in the following format which I'm used to call "spell format " (not sure if thats a common expression (?). It is structured as follows:


Person-ID | Emploment-Episode-ID | start | end | Income | sector? | hrsperweek ...

Any person can have multiple emploment spells, each with start, end, income, hoursperweek worked and a bunch of more variables. Moreover, the durations of the employment states can vary across and within persons.

The date is not in a typical day-month-year format,  but represented by a number that represents the time elapsed since 1970/01/01.


It looks like this then:

1 1 502 503 3.500 € public sector 42 hrsperweek
1 2 504 604 3.900 € public sector 42 hrsperweek

2 1 502 555 2.200 € private sector 20 hrsperweek
2 2 556 600 4.000 € private sector 42 hrsperweek
2 3 601 604 4.500 € private sector 40 hrsperweek

3 1 550 553 1.500 € self-employed 60 hrspwerweek


I hope you can see that not necessarily the whole time period is covered, there can be gaps in which persons have been unemployed or studying or whatever.

I would like to transform this data into something like a standard balanced panel dataset which gives me the state for every person in every month over the whole period (in this example the period 502-604). In particular it should look like this:

Month | Person-ID | Emploment-Episode-ID | Income | sector? | hrsperweek ...

In the end it shold be a HUGE data file looking like this:

502 1 1 ...
502 2 1 ...
502 3 -
503 1 1 ...
503 2 1 ...
503 3 -
504 1 2 ...
504 2 1 ...
504 3 -

and so on.


I looked into statas survival capabilities, but am not sure if those are really helpful here.

Can anyone tell me how to approach my problem??
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index