Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merge and nearest value


From   Francesco <k7br@gmx.fr>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: merge and nearest value
Date   Sun, 19 Aug 2012 14:11:45 +0200

Well David my apologies because your idea is indeed really
appropriate, after some slight data manipulation
Indeed I have just to drop the Type-Date duplicates in A, use your
joinby suggestion (thus obtaining dataset C) and then perform a m:1
merge between A and C.

Thank you !
Best

On 19 August 2012 12:07, Francesco <k7br@gmx.fr> wrote:
> Dear David,
>
> Many thanks for your interesting suggestion!
> Unfortunately it cannot work because
> 1) There can be numerous duplicates of TYPE and DATE in A. However in
> B TYPE and DATE uniquely idenfity observations
> 2) I have several million of observations, therefore using a joinby
> would probably destroy my computer's Ram ...
>
> :-(
>
> On 19 August 2012 05:13, David Kantor <kantor.d@att.net> wrote:
>> First, I presume that in A, TYPE and DATE uniquely identify observations.
>>
>> I suggest you do a -joinby- on TYPE. This will create a large multitude of
>> observations.
>> Then for each distinct TYPE and DATE combination, compute the difference and
>> then select (by TYPE and DATE) the one with the minimal difference.
>> You can do appropriate sorting to break ties in the manner you desire.
>>
>> HTH
>> --David
>>
>>
>> At 07:22 PM 8/18/2012, Francesco wrote:
>>>
>>> Dear Statalist,
>>>
>>> I wish again that you could help me with this particular merging
>>> problem...
>>>
>>> Let say I have a dataset A as:
>>>
>>> TYPE   DATE
>>> A            2
>>> A            5
>>> A            20
>>> B            10
>>> B            2
>>>
>>>
>>> and I have another dataset B as :
>>>
>>>
>>> TYPE  Special_Date
>>> A              2
>>> A              6
>>> A              20
>>> A              22
>>> B              5
>>> B              6
>>>
>>> The question is : I would like to obtain the difference between the
>>> date of each observation in A and the closest special date in B with
>>> the same type. In case of ties I would take the latest date of the
>>> two.
>>>
>>> For example I would obtain here
>>>
>>> TYPE   DATE   Difference
>>> A            2            0=2-2
>>> A            5            -1=5-6
>>> A            20            0=20-20
>>> B            10           +4=10-6
>>> B            2             -3=2-5
>>>
>>>
>>> I was thinking of reshaping the dataset B in order to have the special
>>> dates in column for each type, merging then on type with A, creating a
>>> difference variable between the date and each special date, and taking
>>> the minimum...
>>> But this involves creating a lot of variables and maybe there is
>>> something more simple ?
>>>
>>> Many thanks for your suggestions,
>>>
>>> Best Regards,
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index