Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Robert Picard <picard@netbox.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: conditional merging |

Date |
Wed, 7 Nov 2012 12:28:43 -0800 |

Here is one way to find the most recent inspection condition for each home sale. *----------- begin example ------------- clear input home_id inspection_year condition 50121 2002 4 50121 2006 4 50121 2011 3 50681 2004 2 50681 2010 3 51040 2006 2 51040 2010 2 51040 2011 3 end tempfile condition qui save "`condition'" clear input home_id sale_year 50121 2007 50121 2011 50681 2008 51040 2003 51040 2010 end append using "`condition'" gen year = cond(mi(sale_year), inspection_year, sale_year) sort home_id year inspection_year gen cond_sale = condition by home_id: replace cond_sale = condition[_n-1] if mi(cond_sale) list, noobs sepby(home_id) keep if !mi(sale_year) keep home_id sale_year cond_sale list *----------- end example ------------- On Wed, Nov 7, 2012 at 11:11 AM, Nick Cox <njcoxstata@gmail.com> wrote: > I am not planning to implement weights. The point about > nearest-neighbour as I define it is that unknown points get > interpolated with the value of the nearest neighbour with a known > value. I've got to think about ways of handling cases in which two > neighbours tie for nearest. > > On Wed, Nov 7, 2012 at 7:03 PM, Ben Hoen <bhoen@lbl.gov> wrote: >> I see. I like the nearest neighbor approach in that one could calculate >> separately a weight of the "interpolation" such that as one interpolated >> values "further" (in time) away from the "known" values their weight would >> decrease. >> >> Thanks for those insights. As always, very interesting & helpful. >> >> I will see if anyone comes forward with a merge idea. >> >> Best, >> >> Ben >> >> Ben Hoen >> LBNL >> Office: 845-758-1896 >> Cell: 718-812-7589 >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >> Sent: Wednesday, November 07, 2012 1:25 PM >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: conditional merging >> >> I will split this into two: >> >> 0. Interpolation. Carry-forward is crude but has the advantage that >> only legitimate values that occur can be carried forward. >> I decided this morning to write a nearest-neighbour interpolation >> program, which would have the same characteristic, except that the >> nearest neighbour could be later as well as before. >> The program would just be an analogue of -ipolate- and therefore not >> assume spacing in time, but would assume position in one dimension >> (not two). >> >> 1. Merging. I am not a merge-master. There should be others on this >> list who merge day in, day out and can give better advice. >> >> Nick >> >> On Wed, Nov 7, 2012 at 3:37 PM, Ben Hoen <bhoen@lbl.gov> wrote: >>> Thanks Nick. >>> >>> I am not sure there is a standard way that these "condition" values trend >>> over time across the whole dataset, and therefore interpolating them might >>> not be appropriate. Moreover, for each home, there might not be many data >>> points. Finally, the values that are allowable for condition are discreet >>> (non-continuous), and therefore would complicate a linear, cubic, cubic >>> spline process (though, of course that could be dealt with by using >> .=int(x) >>> ). Would the interpolation allow me to take into account all of these >>> characteristics? >>> >>> For, in part, this reason, I was hoping to find some way to execute a >>> "conditional merge" (again, my words). Additionally, the process of >>> learning how one might do it with this "condition" data, could be applied >> to >>> extracting other characteristic data that are also only present >> sporadically >>> across time (e.g., size of the home) but that also might periodically >> change >>> (e.g., the home might be added to). >>> >>> Is there a way to use if/then statements in a merge process? >> Nick Cox >> >>> Carry forward can be as little as one line of code: see >>> >>> FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing >>> values >>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. >>> Cox >>> 2/03 How can I replace missing values with previous or >>> following nonmissing values? >>> >> http://www.stata.com/support/faqs/data-management/replacing-missing-values/ >>> >>> I don't see that this is an imputation problem at all. It calls for >>> interpolation. Indeed, have you considered some kind of interpolation, >>> say linear, cubic, cubic spline? >>> >>> On Tue, Nov 6, 2012 at 7:33 PM, Ben Hoen <bhoen@lbl.gov> wrote: >>> >>>> I have two files sales.dta and condition.dta. sales.dta has two >> variables >>>> (home_id saleyear), and condition.dta has three variables (home_id >>>> inspection_year condition). The variable inspection_year can take the >>> vales >>>> of 2000-2011 for any home but for many homes only some years are present >>> (in >>>> many years the home was not inspected. Therefore a sample of the data >>> might >>>> look like: >>>> >>>> home_id inspection_year condition >>>> 50121 2002 4 >>>> 50121 2006 4 >>>> 50121 2011 3 >>>> 50681 2004 2 >>>> 50681 2010 3 >>>> 51040 2006 2 >>>> 51040 2010 2 >>>> 51040 2011 3 >>>> >>>> I would like to populate the sales.dta file with the condition of the >>> parcel >>>> in the inspection_year that is the closest to, but not following the >>>> sale_year. >>>> >>>> So, for example, the following dataset would result >>>> >>>> home_id sale_year condition >>>> 50121 2007 4 >>>> 50121 2011 3 >>>> 50681 2008 2 >>>> 51040 2003 . >>>> 51040 2010 3 >>>> >>>> I know I am not the first person to have this problem, but could not find >>>> threads on this. Maybe I am using the wrong search terms. Any help >> would >>>> be greatly appreciated. >>>> >>>> (As I wrote this I realized one not as elegant work-around would be to >>>> fill-in missing data for each missing year in the condition.dta file, >>>> potentially using the user-written "carryforward" or even imputing the >>> data >>>> using, e.g., mi impute, and then matching home_id sale_year to home_id >>>> inspection_year.) > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: conditional merging***From:*"Ben Hoen" <bhoen@lbl.gov>

**Re: st: conditional merging***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: conditional merging***From:*"Ben Hoen" <bhoen@lbl.gov>

**Re: st: conditional merging***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: conditional merging***From:*"Ben Hoen" <bhoen@lbl.gov>

**Re: st: conditional merging***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Rename all?** - Next by Date:
**st: Using tsset on panel data that are differentiated by two id variables** - Previous by thread:
**Re: st: conditional merging** - Next by thread:
**Re: st: conditional merging** - Index(es):