Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Restructuring the time dimension in a dataset

From   Tunga Kantarcı <>
Subject   st: Restructuring the time dimension in a dataset
Date   Fri, 11 Oct 2013 20:47:31 +0200


I have a dataset where ‘variable one’ indicates a unique
identification number for each individual in the data. Then there is
‘variable two’ which indicates a date (like 01-01-2010) which is the
start date of a period and ‘variable three’ indicates a date (like
05-01-2010) which is the end date of the same period. Then there is
‘variable four’ which indicates a number between 0 and 1 (like 0.574)
that has been realised during the period 01-01-2010 - 05-01-2010.

A snapshot of the data sheet for individual 4115111 looks like this:

4115111                01-01-2010           05-01-2010           0.574
4115111                05-01-2010           31-09-2011           0.321

In this dataset, as the snapshot also shows, the length of a period is
irregular. It can be as short as a day (like 01-01-2010 – 02-01-2010)
or as long as a year (like 01-01-2010 - 01-01-2011), or even longer.
Hence it is not clear how I should treat the time dimension of the
data. The cases of variable four are not observed on a monthly or
yearly basis. I plan to restructure the data. That is, I plan to
fragment each period into multiple periods with a length of one day
and then aggregate them to, say, a month. This means that the first
period, which is

4115111                01-01-2010           05-01-2010           0.574,

would be fragmented into

4115111                01-01-2010           02-01-2010           0.574
4115111                02-01-2010           03-01-2010           0.574
4115111                03-01-2010           04-01-2010           0.574
4115111                04-01-2010           05-01-2010           0.574,

and the second period, which is

4115111                05-01-2010           31-09-2011           0.321,

would be fragmented into

4115111                05-01-2010           06-01-2010           0.321
4115111                30-09-2011           31-09-2011           0.321.

After this fragmentation, I plan to collapse the daily series to
monthly series which would mean that variable four will be averaged
over the days of a month to make up a monthly number, perhaps using
the “collapse variable four, by(variable two)” command. In the end I
would like to have monthly data.

Given this explanation, I would like to ask two questions.

Question one: In Stata, how can I fragment each case (that is each row
in the data) into multiple cases (multiple rows) with respect to
variable two and variable three as explained above?

Question two: If it was your own data, how would you treat it? Would
your approach be the same as mine?


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index