Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Torsten Häberle <haeberle.torsten@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Two datasets: Look for similar observations in the second dataset |
Date | Sun, 26 Jan 2014 20:33:05 +0100 |
Could anybody help with this? This problem is killing me. Maybe Nick, please? Thanks... 2014-01-25 Torsten Häberle <haeberle.torsten@gmail.com>: > Hey guys, > > I have quite a difficult "matching" problem to solve and I am not sure > how to approach it. This is the situation: > > I have two datasets: > 1) The first one is my sample dataset > 2) The second one is basically the entire population, but excluding my > sample dataset > > Both datasets include data about firms. In general, what I want to do: > Find for each firm in dataset (1) another "matching" firm in dataset > (2) that is as similar as possible to the sample firm in dataset (1) > (based on two characteristics). > > Dataset 1 looks like: > > Company Year CompanySize A ratio > A 2012 140 0.2 > B 2011 200 0.4 > C 2010 300 0.2 > > It includes many firms over a period of 20 years including their > characteristics. There are two matching characteristics: the company > size and a (company) ratio that I calculated. > For example, company A has a size of 140 and a ratio of 0.2 in 2012. > Now, I want to find a firm in dataset (2), which is similar to firm A > in dataset (1) in the same year 2012. > > Dataset 2 looks very similar: > > Company Year CompanySize A ratio > X 2012 150 0.19 > Y 2012 280 0.9 > Z 2012 50 0.01 > ... > > Dataset (2) includes many many other firms. As mentioned, I want to > find a matching firm for each sample firm. This should be somehow > constructed by a loop or macro (?) I think, but I am not sure. > > The match should be conducted in the following way. Let's assume in > our example that we want to find a matching firm for sample firm A in > dataset (1). > 1) Characteristic: CompanySize >> First matching characteristic > Stata shall pick all firms from dataset (2) that have a company size > between 80% and 120% of firm A's size. All other firms in dataset (2) > shall be immediately dismissed. This is basically the first step in > the matching procedure. > In our case: Company size is 140 and range 112 - 168. All firms in > dataset (2) that have a CompanySize of above 168 or below 112 shall be > dismissed --> Company Y and Z. > > 2) Characteristic: Ratio >> Second matching characteristic > Now, Stata shall pick from the remaining firms in dataset (2) the > single one firm which has the most similar ratio as firm A from > dataset (1) has. In our example, this would be Company X. This should > be done somehow like: > Ratio firm A dataset (1) - Ratio of firm X dataset (2) = 0.2 - 0.19 = 0.01 > - Ratio of firm Y = 0.4 - 0.9 = - 0.5 > - Ratio of firm Z = 0.2 - 0.01 = 0.19 >>>> Pick firm X since the the difference is the smallest. Be careful here: Y and Z > are actually already excluded due to their CompanySize (first matching > characteristic). This > is just an example. > > Finally, to make it even more complicated: I am not only looking for > the "best" (closest) match, but also for the second and third closest > match. > > In the end, I want to get one dataset that looks like this: > > Company Matching Firm 1 Matching Firm 2 MF3 > A X 2nd rank > 3rd > > Hopefully, I made my problem clear. Would appreciate some help. Since > this matching > has to be done for every sample firm, this has to be some kind of > loop/macro that does > this matching over and over again for every sample firm. > > Thanks! > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/