Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Restructuring the time dimension in a dataset


From   Maarten Buis <maartenlbuis@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Restructuring the time dimension in a dataset
Date   Fri, 11 Oct 2013 21:03:59 +0200

Question 1: -help stsplit-

Question 2: that depends on so many things....

Hope this helps,
Maarten


On Fri, Oct 11, 2013 at 8:47 PM, Tunga Kantarcı <tungakantarci@gmail.com> wrote:
> Hello,
>
> I have a dataset where ‘variable one’ indicates a unique
> identification number for each individual in the data. Then there is
> ‘variable two’ which indicates a date (like 01-01-2010) which is the
> start date of a period and ‘variable three’ indicates a date (like
> 05-01-2010) which is the end date of the same period. Then there is
> ‘variable four’ which indicates a number between 0 and 1 (like 0.574)
> that has been realised during the period 01-01-2010 - 05-01-2010.
>
> A snapshot of the data sheet for individual 4115111 looks like this:
>
> 4115111                01-01-2010           05-01-2010           0.574
> 4115111                05-01-2010           31-09-2011           0.321
>
> In this dataset, as the snapshot also shows, the length of a period is
> irregular. It can be as short as a day (like 01-01-2010 – 02-01-2010)
> or as long as a year (like 01-01-2010 - 01-01-2011), or even longer.
> Hence it is not clear how I should treat the time dimension of the
> data. The cases of variable four are not observed on a monthly or
> yearly basis. I plan to restructure the data. That is, I plan to
> fragment each period into multiple periods with a length of one day
> and then aggregate them to, say, a month. This means that the first
> period, which is
>
> 4115111                01-01-2010           05-01-2010           0.574,
>
> would be fragmented into
>
> 4115111                01-01-2010           02-01-2010           0.574
> 4115111                02-01-2010           03-01-2010           0.574
> 4115111                03-01-2010           04-01-2010           0.574
> 4115111                04-01-2010           05-01-2010           0.574,
>
> and the second period, which is
>
> 4115111                05-01-2010           31-09-2011           0.321,
>
> would be fragmented into
>
> 4115111                05-01-2010           06-01-2010           0.321
> .
> .
> 4115111                30-09-2011           31-09-2011           0.321.
>
> After this fragmentation, I plan to collapse the daily series to
> monthly series which would mean that variable four will be averaged
> over the days of a month to make up a monthly number, perhaps using
> the “collapse variable four, by(variable two)” command. In the end I
> would like to have monthly data.
>
> Given this explanation, I would like to ask two questions.
>
> Question one: In Stata, how can I fragment each case (that is each row
> in the data) into multiple cases (multiple rows) with respect to
> variable two and variable three as explained above?
>
> Question two: If it was your own data, how would you treat it? Would
> your approach be the same as mine?
>
> Tunga
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index