Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Difficult merging process

From   "Keith Dear" <[email protected]>
To   <[email protected]>
Subject   RE: st: Difficult merging process
Date   Tue, 21 Dec 2010 15:00:11 +1100

You don't say how large your two datasets are, but is -cross- out of the

If you CAN build a dataset consisting of all combinations of pregnancies
and weather events, then obviously it's simple from there: you can use
whatever arbitrarily complex criteria you please to discard unwanted
pairs, probably finishing with -duplicates drop- to retain only one
record per exposed birth.

If the mega-cross is too mega, perhaps you can still use this approach
in chunks, e.g. one year (or decade) of birth at a time.

(not subtle, but actually sledgehammers do a pretty good job on

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael N.
Sent: Tuesday, 21 December 2010 2:25 PM
To: [email protected]
Subject: Re: st: Difficult merging process

Dear Nathan

   This is a very thorny problem!

   My first thought is to try and focus on the dates as a way of
matching, matching on the 
"week" of the whether event to "week" of pregnancy...

   1) For the weather dataset, convert the "date" of the event into the
"week" of the 
event (using the -wofd()- function). Call this variable "dateweek".

   2) For the pregnancy dataset, make one record for every week of
pregnancy per person 
(for a 9 month gestation, each person would have 36 records, one for
each week of 
pregnancy... each record would be identified by the person id and the
"dateweek" for each 
week of pregnancy.)  This step would involve computing the number of
weeks of gestation, 
using the "expand" command to make the multiple records per person, and
then a -by id: 
generate- to compute the week number of pregnancy for the multiple weeks
of pregnancy.

   3) match merge the "weather" dataset to the "pregnancy" dataset on
"dateweek", keeping 
just the matches. The resulting dataset contains the weeks of pregnancy
with a weather event.

   4) Compute the distance from the mother to the weather event.
Eliminate events that are 
too many miles away.

   5) There may be multiple records per mother per weather event. Use
collapse to make one 
record per mother.

   Others might have better thoughts. I hope this helps.

Michael N. Mitchell
Data Management Using Stata      -
A Visual Guide to Stata Graphics -
Stata tidbit of the week         -

On 2010-12-20 5.46 PM, Nathan Hutto wrote:
> Hi all,
> I am attempting to merge two data sets in a way that is new to am and
> am having trouble figuring out how to do so. One data set contains
> geo-coded birth certificates and the other contains weather events. I
> want to determine whether a mother was exposed to a weather event
> during the course of her pregnancy. I have the birth date and
> gestational length, so I can determine the dates of gestation by
> subtracting one from the other. I also have the latitude and
> longitude, address, city, and state of each pregnancy and weather
> event. For this case, exposure to a weather event would be defined as
> being pregnant when a weather event occurred in close proximity.
> I'm ok with over-merging a little bit; I can determine the exact
> exposure by using one of Stata's length commands that can calculate
> distance with latitude and longitude. But given that many people in my
> data are exposed to a number of weather events, I'd like to whittle
> down the amount of false positives.
> Any thoughts on this?
> Thank you,
> Nathan
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index