Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Ben Hoen" <bhoen@lbl.gov> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: conditional merging |
Date | Thu, 8 Nov 2012 11:44:21 -0500 |
Jeph & Robert, Taking both of your suggestions and making slight alterations I was able to get two sets of code working fine. Thank you! Ben *********************************************************** clear input home_id inspyear condition 50121 2002 4 50121 2006 4 50121 2011 3 50681 2004 2 50681 2010 3 51040 2006 2 51040 2010 2 51040 2011 3 end tempfile condition save "`condition'" clear input home_id saleyear 50121 2007 50121 2011 50681 2008 51040 2003 51040 2010 end tempfile sales save "`sales'" *========= jeph's method joinby home_id using "`condition'" gen lag = saleyear-inspyear bys home_id saleyear: egen maxlag=max(lag) bys home_id saleyear: replace condition=. if maxlag<0 & _n==1 bys home_id saleyear: replace inspyear=. if maxlag<0 & _n==1 drop if lag<0 & condition!=. bys home_id saleyear: egen minlag=min(lag) bys home_id saleyear: keep if minlag==lag|condition==. drop lag minlag maxlag list *====== robert's method use "`sales'", clear append using "`condition'" gen year = cond(mi(saleyear), inspyear, saleyear) sort home_id year inspyear by home_id: replace condition = condition[_n-1] if mi(condition) by home_id: replace inspyear = inspyear[_n-1] if mi(inspyear) keep if !mi(saleyear) keep home_id saleyear condition inspyear list Ben Hoen LBNL Office: 845-758-1896 Cell: 718-812-7589 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jeph Herrin Sent: Wednesday, November 07, 2012 4:10 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: conditional merging how about: u sales, clear merge 1:m home_id using condition gen lag = sale_year-inspection_year bys id_home : egen minimum_lag=min(lag) gen sale_condition if lag==minimum_lag & lag>=0 bys home_id (sale_condition) : keep if _n==1 ? this should keep one record per home_id, with closest inspection_year prior to the sale date and the corresponding sale_condition. hth, jeph On 11/6/2012 2:33 PM, Ben Hoen wrote: > I have two files sales.dta and condition.dta. sales.dta has two variables > (home_id saleyear), and condition.dta has three variables (home_id > inspection_year condition). The variable inspection_year can take the vales > of 2000-2011 for any home but for many homes only some years are present (in > many years the home was not inspected. Therefore a sample of the data might > look like: > > home_id inspection_year condition > 50121 2002 4 > 50121 2006 4 > 50121 2011 3 > 50681 2004 2 > 50681 2010 3 > 51040 2006 2 > 51040 2010 2 > 51040 2011 3 > > I would like to populate the sales.dta file with the condition of the parcel > in the inspection_year that is the closest to, but not following the > sale_year. > > So, for example, the following dataset would result > > home_id sale_year condition > 50121 2007 4 > 50121 2011 3 > 50681 2008 2 > 51040 2003 . > 51040 2010 3 > > I know I am not the first person to have this problem, but could not find > threads on this. Maybe I am using the wrong search terms. Any help would > be greatly appreciated. > > (As I wrote this I realized one not as elegant work-around would be to > fill-in missing data for each missing year in the condition.dta file, > potentially using the user-written "carryforward" or even imputing the data > using, e.g., mi impute, and then matching home_id sale_year to home_id > inspection_year.) > > Thanks, in advance! > > Ben Hoen > Principal Research Associate > Lawrence Berkeley National Laboratory > Office: 845-758-1896 > Cell: 718-812-7589 > bhoen@lbl.gov > http://emp.lbl.gov/staff/ben-hoen > > Visit our publications at: > http://emp.lbl.gov/publications > > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/