Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Merging/summing daily values across time periods in (unbalanced) panel dataset |
Date | Wed, 9 Mar 2011 13:11:24 +0000 |
Assuming that your -date_from- and -date_to- are really daily date variables, you can start with gen duration = date_to - date_from + 1 expand duration on your master and then use the logic in FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox 2/03 How can I replace missing values with previous or following nonmissing values? http://www.stata.com/support/faqs/data/missing.html to replace with sequences of dates. Nick n.j.cox@durham.ac.uk Amanda Dwelley Thanks, good point! The same daily x1 x2 apply to everyone in the master dataset - to start, a "merge" ID could be 1 for everyone in the master data set and all of the using data set. Master data set looks like: id date_from date_to merge_id 2 07apr2009 09jun2009 1 2 10jun2009 03jul2009 1 3 16may2009 12jun2009 1 3 13jun2009 10jul2009 1 (~50,000 observations; time periods at irregular intervals) Daily data could look like: merge_id date x1 x2 1 02jun2009 5.7 0 1 03jun2009 2.3 0 1 04jun2009 0 2 1 05jun2009 0 3.5 (3 yrs of daily values that are same on each day for each person in master data set) For example, for first observation in master data, looking for sum of x1 from April 7-June 9, and sum of x2 within same dates. On Tue, Mar 8, 2011 at 8:47 PM, Nick Cox <njcoxstata@gmail.com> wrote: > I see no information on identifiers in your daily dataset. > > Nick > > On Tue, Mar 8, 2011 at 10:15 PM, Amanda Dwelley <adwelley@gmail.com> wrote: > >> I'm looking for suggestions on how to join and sum daily x values from one >> dataset into time periods defined by a start date and end date in another >> (panel) data set. >> >> Master data set looks like: >> id date_from date_to >> 1 07apr2009 09jun2009 >> 1 10jun2009 03jul2009 >> 2 13jun2009 10jul2009 >> 2 11jul2009 05aug2009 >> (~50,000 observations; time periods at irregular intervals) >> >> Daily data looks like: >> date x1 x2 >> 02jun2009 5.7 0 >> 03jun2009 2.3 0 >> 04jun2009 0 2 >> 05jun2009 0 3.5 >> (3 yrs of daily values) >> >> As you can see the using data could be in wide or long format. >> If I merge the wide format daily data into the master dataset, is there any >> way to reference/look for the dates in variable names that are between the >> start/end date, and sum associated variables? If using the long format of >> the daily data, is there a way to sum values between start/end dates during >> a join or merge? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/