Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: conditional merging


From   "Ben Hoen" <[email protected]>
To   <[email protected]>
Subject   RE: st: conditional merging
Date   Wed, 7 Nov 2012 14:03:17 -0500

I see.  I like the nearest neighbor approach in that one could calculate
separately a weight of the "interpolation" such that as one interpolated
values "further" (in time) away from the "known" values their weight would
decrease.  

Thanks for those insights.  As always, very interesting & helpful.  

I will see if anyone comes forward with a merge idea.  

Best,

Ben

Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, November 07, 2012 1:25 PM
To: [email protected]
Subject: Re: st: conditional merging

I will split this into two:

0. Interpolation. Carry-forward is crude but has the advantage that
only legitimate values that occur can be carried forward.
I decided this morning to write a nearest-neighbour interpolation
program, which would have the same characteristic, except that the
nearest neighbour could be later as well as before.
The program would just be an analogue of -ipolate- and therefore not
assume spacing in time, but would assume position in one dimension
(not two).

1. Merging. I am not a merge-master. There should be others on this
list who merge day in, day out and can give better advice.

Nick

On Wed, Nov 7, 2012 at 3:37 PM, Ben Hoen <[email protected]> wrote:
> Thanks Nick.
>
> I am not sure there is a standard way that these "condition" values trend
> over time across the whole dataset, and therefore interpolating them might
> not be appropriate.  Moreover, for each home, there might not be many data
> points.  Finally, the values that are allowable for condition are discreet
> (non-continuous), and therefore would complicate a linear, cubic, cubic
> spline process (though, of course that could be dealt with by using
.=int(x)
> ).  Would the interpolation allow me to take into account all of these
> characteristics?
>
> For, in part, this reason, I was hoping to find some way to execute a
> "conditional merge" (again, my words).  Additionally, the process of
> learning how one might do it with this "condition" data, could be applied
to
> extracting other characteristic data that are also only present
sporadically
> across time (e.g., size of the home) but that also might periodically
change
> (e.g., the home might be added to).
>
> Is there a way to use if/then statements in a merge process?
Nick Cox

> Carry forward can be as little as one line of code: see
>
> FAQ     . . . . . . . . . . . . . . . . . . . . . . . Replacing missing
> values
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J.
> Cox
>         2/03    How can I replace missing values with previous or
>                 following nonmissing values?
>
http://www.stata.com/support/faqs/data-management/replacing-missing-values/
>
> I don't see that this is an imputation problem at all. It calls for
> interpolation. Indeed, have you considered some kind of interpolation,
> say linear, cubic, cubic spline?
>
> On Tue, Nov 6, 2012 at 7:33 PM, Ben Hoen <[email protected]> wrote:
>
>> I have two files sales.dta and condition.dta.  sales.dta has two
variables
>> (home_id saleyear), and condition.dta has three variables (home_id
>> inspection_year condition).  The variable inspection_year can take the
> vales
>> of 2000-2011 for any home but for many homes only some years are present
> (in
>> many years the home was not inspected. Therefore a sample of the data
> might
>> look like:
>>
>> home_id inspection_year condition
>> 50121           2002                    4
>> 50121           2006                    4
>> 50121           2011                    3
>> 50681           2004                    2
>> 50681           2010                    3
>> 51040           2006                    2
>> 51040           2010                    2
>> 51040           2011                    3
>>
>> I would like to populate the sales.dta file with the condition of the
> parcel
>> in the inspection_year that is the closest to, but not following the
>> sale_year.
>>
>> So, for example, the following dataset would result
>>
>> home_id sale_year       condition
>> 50121           2007            4
>> 50121           2011            3
>> 50681           2008            2
>> 51040           2003            .
>> 51040           2010            3
>>
>> I know I am not the first person to have this problem, but could not find
>> threads on this.  Maybe I am using the wrong search terms.  Any help
would
>> be greatly appreciated.
>>
>> (As I wrote this I realized one not as elegant work-around would be to
>> fill-in missing data for each missing year in the condition.dta file,
>> potentially using the user-written "carryforward" or even imputing the
> data
>> using, e.g., mi impute, and then matching home_id sale_year to home_id
>> inspection_year.)
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index