Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Difficult merging process

From	Nick Sanders <[email protected]>
To	[email protected]
Subject	Re: st: Difficult merging process
Date	Tue, 21 Dec 2010 15:27:55 -0800

Couldn't you also drop certain regions before you do the joining command? That is, say you decide to do everything at the zip code level, and want to include any events that occurred within 10 miles. You could:

1. Calculate the distance between each zip code/weather event (using something like the user written "vincenty", which works nicely), which creates a master file of distances between all zip codes and event locations – that is, each zip code has a distance value for each location of weather event, resulting in a whole lot of zip code observations
3. Drop all with a distance greater than 10 miles . . . now you're left with a data set that consists of zip codes and all weather event locations within 10 miles of that zip code.
4. Use "joinby" to attach those events to a file of zip code X calendar time variables - now you have a data set of events by time by zip code
5. Use "joinby" AGAIN to match that to your mother by day data
6. Collapse to a single mother observation using something like "count" for the number of weather events within the applicable time frame

Not the prettiest thing in the world, but might be better at handling gigantic data sets.

-Nick

--
Nicholas J. Sanders, Ph.D.
Postdoctoral Fellow
Stanford Institute for Economic Policy Research
366 Galvez St, Room 228
Stanford, CA 94305

On Dec 21, 2010, at 7:31 AM, Nathan Hutto wrote:

> Thanks to both of you. Both of these seem like reasonable ways to
> solve this issue, so I'll certainly try them out. I took a quick stab
> at  -cross- and it froze up my computer, so I'll have to do it in
> chunks as you suggest.
> 
> Best,
> Nathan
> 
> On Mon, Dec 20, 2010 at 11:00 PM, Keith Dear <[email protected]> wrote:
>> 
>> Nathan,
>> You don't say how large your two datasets are, but is -cross- out of the
>> question?
>> 
>> If you CAN build a dataset consisting of all combinations of pregnancies
>> and weather events, then obviously it's simple from there: you can use
>> whatever arbitrarily complex criteria you please to discard unwanted
>> pairs, probably finishing with -duplicates drop- to retain only one
>> record per exposed birth.
>> 
>> If the mega-cross is too mega, perhaps you can still use this approach
>> in chunks, e.g. one year (or decade) of birth at a time.
>> 
>> (not subtle, but actually sledgehammers do a pretty good job on
>> walnuts!)
>> kd
>> 
>> 
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Michael N.
>> Mitchell
>> Sent: Tuesday, 21 December 2010 2:25 PM
>> To: [email protected]
>> Subject: Re: st: Difficult merging process
>> 
>> Dear Nathan
>> 
>>   This is a very thorny problem!
>> 
>>   My first thought is to try and focus on the dates as a way of
>> matching, matching on the
>> "week" of the whether event to "week" of pregnancy...
>> 
>>   1) For the weather dataset, convert the "date" of the event into the
>> "week" of the
>> event (using the -wofd()- function). Call this variable "dateweek".
>> 
>>   2) For the pregnancy dataset, make one record for every week of
>> pregnancy per person
>> (for a 9 month gestation, each person would have 36 records, one for
>> each week of
>> pregnancy... each record would be identified by the person id and the
>> "dateweek" for each
>> week of pregnancy.)  This step would involve computing the number of
>> weeks of gestation,
>> using the "expand" command to make the multiple records per person, and
>> then a -by id:
>> generate- to compute the week number of pregnancy for the multiple weeks
>> of pregnancy.
>> 
>>   3) match merge the "weather" dataset to the "pregnancy" dataset on
>> "dateweek", keeping
>> just the matches. The resulting dataset contains the weeks of pregnancy
>> with a weather event.
>> 
>>   4) Compute the distance from the mother to the weather event.
>> Eliminate events that are
>> too many miles away.
>> 
>>   5) There may be multiple records per mother per weather event. Use
>> collapse to make one
>> record per mother.
>> 
>>   Others might have better thoughts. I hope this helps.
>> 
>> Michael N. Mitchell
>> Data Management Using Stata      -
>> http://www.stata.com/bookstore/dmus.html
>> A Visual Guide to Stata Graphics -
>> http://www.stata.com/bookstore/vgsg.html
>> Stata tidbit of the week         - http://www.MichaelNormanMitchell.com
>> 
>> 
>> 
>> On 2010-12-20 5.46 PM, Nathan Hutto wrote:
>>> Hi all,
>>> 
>>> I am attempting to merge two data sets in a way that is new to am and
>>> am having trouble figuring out how to do so. One data set contains
>>> geo-coded birth certificates and the other contains weather events. I
>>> want to determine whether a mother was exposed to a weather event
>>> during the course of her pregnancy. I have the birth date and
>>> gestational length, so I can determine the dates of gestation by
>>> subtracting one from the other. I also have the latitude and
>>> longitude, address, city, and state of each pregnancy and weather
>>> event. For this case, exposure to a weather event would be defined as
>>> being pregnant when a weather event occurred in close proximity.
>>> 
>>> I'm ok with over-merging a little bit; I can determine the exact
>>> exposure by using one of Stata's length commands that can calculate
>>> distance with latitude and longitude. But given that many people in my
>>> data are exposed to a number of weather events, I'd like to whittle
>>> down the amount of false positives.
>>> 
>>> Any thoughts on this?
>>> 
>>> Thank you,
>>> Nathan
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Difficult merging process
  - From: Nathan Hutto <[email protected]>
- Re: st: Difficult merging process
  - From: "Michael N. Mitchell" <[email protected]>
- RE: st: Difficult merging process
  - From: "Keith Dear" <[email protected]>
- Re: st: Difficult merging process
  - From: Nathan Hutto <[email protected]>

Prev by Date: Re: st:any easy alternative way when -egen- is not allowed to combine with by
Next by Date: Re: st:any easy alternative way when -egen- is not allowed to combine with by
Previous by thread: Re: st: Difficult merging process
Next by thread: st: bootstrap xtmelogit
Index(es):
- Date
- Thread