Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Francesco <k7br@gmx.fr> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: merge and nearest value |

Date |
Sun, 19 Aug 2012 14:11:45 +0200 |

Well David my apologies because your idea is indeed really appropriate, after some slight data manipulation Indeed I have just to drop the Type-Date duplicates in A, use your joinby suggestion (thus obtaining dataset C) and then perform a m:1 merge between A and C. Thank you ! Best On 19 August 2012 12:07, Francesco <k7br@gmx.fr> wrote: > Dear David, > > Many thanks for your interesting suggestion! > Unfortunately it cannot work because > 1) There can be numerous duplicates of TYPE and DATE in A. However in > B TYPE and DATE uniquely idenfity observations > 2) I have several million of observations, therefore using a joinby > would probably destroy my computer's Ram ... > > :-( > > On 19 August 2012 05:13, David Kantor <kantor.d@att.net> wrote: >> First, I presume that in A, TYPE and DATE uniquely identify observations. >> >> I suggest you do a -joinby- on TYPE. This will create a large multitude of >> observations. >> Then for each distinct TYPE and DATE combination, compute the difference and >> then select (by TYPE and DATE) the one with the minimal difference. >> You can do appropriate sorting to break ties in the manner you desire. >> >> HTH >> --David >> >> >> At 07:22 PM 8/18/2012, Francesco wrote: >>> >>> Dear Statalist, >>> >>> I wish again that you could help me with this particular merging >>> problem... >>> >>> Let say I have a dataset A as: >>> >>> TYPE DATE >>> A 2 >>> A 5 >>> A 20 >>> B 10 >>> B 2 >>> >>> >>> and I have another dataset B as : >>> >>> >>> TYPE Special_Date >>> A 2 >>> A 6 >>> A 20 >>> A 22 >>> B 5 >>> B 6 >>> >>> The question is : I would like to obtain the difference between the >>> date of each observation in A and the closest special date in B with >>> the same type. In case of ties I would take the latest date of the >>> two. >>> >>> For example I would obtain here >>> >>> TYPE DATE Difference >>> A 2 0=2-2 >>> A 5 -1=5-6 >>> A 20 0=20-20 >>> B 10 +4=10-6 >>> B 2 -3=2-5 >>> >>> >>> I was thinking of reshaping the dataset B in order to have the special >>> dates in column for each type, merging then on type with A, creating a >>> difference variable between the date and each special date, and taking >>> the minimum... >>> But this involves creating a lot of variables and maybe there is >>> something more simple ? >>> >>> Many thanks for your suggestions, >>> >>> Best Regards, >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: merge and nearest value***From:*Francesco <cariboupad@gmx.fr>

**Re: st: merge and nearest value***From:*David Kantor <kantor.d@att.net>

**Re: st: merge and nearest value***From:*Francesco <k7br@gmx.fr>

- Prev by Date:
**st: Interaction effect** - Next by Date:
**Re: st: coefficient interpretation in OLS** - Previous by thread:
**Re: st: merge and nearest value** - Next by thread:
**Re: st: merge and nearest value** - Index(es):