Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merge and nearest value


From   Francesco <k7br@gmx.fr>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: merge and nearest value
Date   Sun, 19 Aug 2012 12:07:11 +0200

Dear David,

Many thanks for your interesting suggestion!
Unfortunately it cannot work because
1) There can be numerous duplicates of TYPE and DATE in A. However in
B TYPE and DATE uniquely idenfity observations
2) I have several million of observations, therefore using a joinby
would probably destroy my computer's Ram ...

:-(

On 19 August 2012 05:13, David Kantor <kantor.d@att.net> wrote:
> First, I presume that in A, TYPE and DATE uniquely identify observations.
>
> I suggest you do a -joinby- on TYPE. This will create a large multitude of
> observations.
> Then for each distinct TYPE and DATE combination, compute the difference and
> then select (by TYPE and DATE) the one with the minimal difference.
> You can do appropriate sorting to break ties in the manner you desire.
>
> HTH
> --David
>
>
> At 07:22 PM 8/18/2012, Francesco wrote:
>>
>> Dear Statalist,
>>
>> I wish again that you could help me with this particular merging
>> problem...
>>
>> Let say I have a dataset A as:
>>
>> TYPE   DATE
>> A            2
>> A            5
>> A            20
>> B            10
>> B            2
>>
>>
>> and I have another dataset B as :
>>
>>
>> TYPE  Special_Date
>> A              2
>> A              6
>> A              20
>> A              22
>> B              5
>> B              6
>>
>> The question is : I would like to obtain the difference between the
>> date of each observation in A and the closest special date in B with
>> the same type. In case of ties I would take the latest date of the
>> two.
>>
>> For example I would obtain here
>>
>> TYPE   DATE   Difference
>> A            2            0=2-2
>> A            5            -1=5-6
>> A            20            0=20-20
>> B            10           +4=10-6
>> B            2             -3=2-5
>>
>>
>> I was thinking of reshaping the dataset B in order to have the special
>> dates in column for each type, merging then on type with A, creating a
>> difference variable between the date and each special date, and taking
>> the minimum...
>> But this involves creating a lot of variables and maybe there is
>> something more simple ?
>>
>> Many thanks for your suggestions,
>>
>> Best Regards,
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index