Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Difficult merging process


From   Nathan Hutto <nathandavidhutto@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Difficult merging process
Date   Tue, 21 Dec 2010 10:31:13 -0500

Thanks to both of you. Both of these seem like reasonable ways to
solve this issue, so I'll certainly try them out. I took a quick stab
at  -cross- and it froze up my computer, so I'll have to do it in
chunks as you suggest.

Best,
Nathan

On Mon, Dec 20, 2010 at 11:00 PM, Keith Dear <Keith.Dear@anu.edu.au> wrote:
>
> Nathan,
> You don't say how large your two datasets are, but is -cross- out of the
> question?
>
> If you CAN build a dataset consisting of all combinations of pregnancies
> and weather events, then obviously it's simple from there: you can use
> whatever arbitrarily complex criteria you please to discard unwanted
> pairs, probably finishing with -duplicates drop- to retain only one
> record per exposed birth.
>
> If the mega-cross is too mega, perhaps you can still use this approach
> in chunks, e.g. one year (or decade) of birth at a time.
>
> (not subtle, but actually sledgehammers do a pretty good job on
> walnuts!)
> kd
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael N.
> Mitchell
> Sent: Tuesday, 21 December 2010 2:25 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Difficult merging process
>
> Dear Nathan
>
>   This is a very thorny problem!
>
>   My first thought is to try and focus on the dates as a way of
> matching, matching on the
> "week" of the whether event to "week" of pregnancy...
>
>   1) For the weather dataset, convert the "date" of the event into the
> "week" of the
> event (using the -wofd()- function). Call this variable "dateweek".
>
>   2) For the pregnancy dataset, make one record for every week of
> pregnancy per person
> (for a 9 month gestation, each person would have 36 records, one for
> each week of
> pregnancy... each record would be identified by the person id and the
> "dateweek" for each
> week of pregnancy.)  This step would involve computing the number of
> weeks of gestation,
> using the "expand" command to make the multiple records per person, and
> then a -by id:
> generate- to compute the week number of pregnancy for the multiple weeks
> of pregnancy.
>
>   3) match merge the "weather" dataset to the "pregnancy" dataset on
> "dateweek", keeping
> just the matches. The resulting dataset contains the weeks of pregnancy
> with a weather event.
>
>   4) Compute the distance from the mother to the weather event.
> Eliminate events that are
> too many miles away.
>
>   5) There may be multiple records per mother per weather event. Use
> collapse to make one
> record per mother.
>
>   Others might have better thoughts. I hope this helps.
>
> Michael N. Mitchell
> Data Management Using Stata      -
> http://www.stata.com/bookstore/dmus.html
> A Visual Guide to Stata Graphics -
> http://www.stata.com/bookstore/vgsg.html
> Stata tidbit of the week         - http://www.MichaelNormanMitchell.com
>
>
>
> On 2010-12-20 5.46 PM, Nathan Hutto wrote:
> > Hi all,
> >
> > I am attempting to merge two data sets in a way that is new to am and
> > am having trouble figuring out how to do so. One data set contains
> > geo-coded birth certificates and the other contains weather events. I
> > want to determine whether a mother was exposed to a weather event
> > during the course of her pregnancy. I have the birth date and
> > gestational length, so I can determine the dates of gestation by
> > subtracting one from the other. I also have the latitude and
> > longitude, address, city, and state of each pregnancy and weather
> > event. For this case, exposure to a weather event would be defined as
> > being pregnant when a weather event occurred in close proximity.
> >
> > I'm ok with over-merging a little bit; I can determine the exact
> > exposure by using one of Stata's length commands that can calculate
> > distance with latitude and longitude. But given that many people in my
> > data are exposed to a number of weather events, I'd like to whittle
> > down the amount of false positives.
> >
> > Any thoughts on this?
> >
> > Thank you,
> > Nathan
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index