Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: expanding a large data set and merging with another data set

From	David Jose <[email protected]>
To	[email protected]
Subject	st: expanding a large data set and merging with another data set
Date	Wed, 24 Apr 2013 10:01:20 +0200

Hi all,

This is a slightly modified question I have posted before, with an
attempted but failed solution.

I have two data sets, one which contains daily pollution data, and
another which contains a data set on individuals. The individual-level
data has information on the date of birth and date of death, and I
would like to merge these two data sets, so that the resulting data
set is an individual-level data set, where for each individual I have
pollution exposure for each day of life.

Some details to give you a better idea of the structure of each data set:

Data set 1 has a personid and his date of birth and date of death.

For example, for persons 1 and 2:

personid    dob         dod

1               1/1/00    1/1/01

2               5/1/05    8/5/09

Data set 2 has a pollution measure for every day of the year.

For example, for the month of January in 2000:

time          pollution

1/1/00        50
1/2/00        49.5
.
.
.
12/31/10        65

I would like to merge these two data sets. The resulting merged data
set would have, for each person, the pollution level for each day of
life. That is, I'd like the merged data set to look like this:

personid    dob         dod     time          pollution

1               1/1/00    1/1/01  1/1/00        50
1               1/1/00    1/1/01  1/2/00        49.5
.
.
.
1               1/1/00    1/1/01  1/1/01        55

2               5/1/05    8/5/09  5/1/05        65
2               5/1/05    8/5/09  5/2/05        62
.
.
.
2               5/1/05    8/5/09  8/5/09        69

etc. etc.

I have tried a solution which creates duplicate observations (using
the expand command) in the individual-level data set, which is based
on the difference (dod-dob+1). I was hoping to merge the (duplicated)
individual-level data set, in which each duplicated observation
corresponds to a different day of life, with the pollution data set.
However, I am not able to go beyond duplication step because I have a
large number of individuals in my data set, and this operation is very
time-intensive.

Does anyone have an idea for a less time-intensive way of merging
these two data sets?

Thanks in advance for any advice.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: expanding a large data set and merging with another data set
  - From: Nick Cox <[email protected]>
- Re: st: expanding a large data set and merging with another data set
  - From: Maarten Buis <[email protected]>

Prev by Date: st: Pseudo R² for xtlogit
Next by Date: Re: st: Pseudo R² for xtlogit
Previous by thread: st: Pseudo R² for xtlogit
Next by thread: Re: st: expanding a large data set and merging with another data set
Index(es):
- Date
- Thread