Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: expanding a large data set and merging with another data set


From   David Jose <davidjosework@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: expanding a large data set and merging with another data set
Date   Wed, 24 Apr 2013 10:01:20 +0200

Hi all,

This is a slightly modified question I have posted before, with an
attempted but failed solution.

I have two data sets, one which contains daily pollution data, and
another which contains a data set on individuals. The individual-level
data has information on the date of birth and date of death, and I
would like to merge these two data sets, so that the resulting data
set is an individual-level data set, where for each individual I have
pollution exposure for each day of life.

Some details to give you a better idea of the structure of each data set:

Data set 1 has a personid and his date of birth and date of death.

For example, for persons 1 and 2:

personid    dob         dod

1               1/1/00    1/1/01

2               5/1/05    8/5/09

Data set 2 has a pollution measure for every day of the year.

For example, for the month of January in 2000:

time          pollution

1/1/00        50
1/2/00        49.5
.
.
.
12/31/10        65

I would like to merge these two data sets. The resulting merged data
set would have, for each person, the pollution level for each day of
life. That is, I'd like the merged data set to look like this:

personid    dob         dod     time          pollution

1               1/1/00    1/1/01  1/1/00        50
1               1/1/00    1/1/01  1/2/00        49.5
.
.
.
1               1/1/00    1/1/01  1/1/01        55

2               5/1/05    8/5/09  5/1/05        65
2               5/1/05    8/5/09  5/2/05        62
.
.
.
2               5/1/05    8/5/09  8/5/09        69

etc. etc.

I have tried a solution which creates duplicate observations (using
the expand command) in the individual-level data set, which is based
on the difference (dod-dob+1). I was hoping to merge the (duplicated)
individual-level data set, in which each duplicated observation
corresponds to a different day of life, with the pollution data set.
However, I am not able to go beyond duplication step because I have a
large number of individuals in my data set, and this operation is very
time-intensive.

Does anyone have an idea for a less time-intensive way of merging
these two data sets?

Thanks in advance for any advice.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index