Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Tunga Kantarcı <tungakantarci@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Restructuring the time dimension in a dataset |

Date |
Fri, 11 Oct 2013 20:47:31 +0200 |

Hello, I have a dataset where ‘variable one’ indicates a unique identification number for each individual in the data. Then there is ‘variable two’ which indicates a date (like 01-01-2010) which is the start date of a period and ‘variable three’ indicates a date (like 05-01-2010) which is the end date of the same period. Then there is ‘variable four’ which indicates a number between 0 and 1 (like 0.574) that has been realised during the period 01-01-2010 - 05-01-2010. A snapshot of the data sheet for individual 4115111 looks like this: 4115111 01-01-2010 05-01-2010 0.574 4115111 05-01-2010 31-09-2011 0.321 In this dataset, as the snapshot also shows, the length of a period is irregular. It can be as short as a day (like 01-01-2010 – 02-01-2010) or as long as a year (like 01-01-2010 - 01-01-2011), or even longer. Hence it is not clear how I should treat the time dimension of the data. The cases of variable four are not observed on a monthly or yearly basis. I plan to restructure the data. That is, I plan to fragment each period into multiple periods with a length of one day and then aggregate them to, say, a month. This means that the first period, which is 4115111 01-01-2010 05-01-2010 0.574, would be fragmented into 4115111 01-01-2010 02-01-2010 0.574 4115111 02-01-2010 03-01-2010 0.574 4115111 03-01-2010 04-01-2010 0.574 4115111 04-01-2010 05-01-2010 0.574, and the second period, which is 4115111 05-01-2010 31-09-2011 0.321, would be fragmented into 4115111 05-01-2010 06-01-2010 0.321 . . 4115111 30-09-2011 31-09-2011 0.321. After this fragmentation, I plan to collapse the daily series to monthly series which would mean that variable four will be averaged over the days of a month to make up a monthly number, perhaps using the “collapse variable four, by(variable two)” command. In the end I would like to have monthly data. Given this explanation, I would like to ask two questions. Question one: In Stata, how can I fragment each case (that is each row in the data) into multiple cases (multiple rows) with respect to variable two and variable three as explained above? Question two: If it was your own data, how would you treat it? Would your approach be the same as mine? Tunga * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Restructuring the time dimension in a dataset***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**st: Stata 13 forecast solve likely bug** - Next by Date:
**Re: st: Estimating MNL model** - Previous by thread:
**st: Stata 13 forecast solve likely bug** - Next by thread:
**Re: st: Restructuring the time dimension in a dataset** - Index(es):