Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset

From	Torsten Häberle <[email protected]>
To	[email protected]
Subject	Re: st: Two datasets: Look for similar observations in the second dataset
Date	Sun, 26 Jan 2014 20:33:05 +0100

Could anybody help with this? This problem is killing me. Maybe Nick,
please? Thanks...


2014-01-25 Torsten Häberle <[email protected]>:
> Hey guys,
>
> I have quite a difficult "matching" problem to solve and I am not sure
> how to approach it. This is the situation:
>
> I have two datasets:
> 1) The first one is my sample dataset
> 2) The second one is basically the entire population, but excluding my
> sample dataset
>
> Both datasets include data about firms. In general, what I want to do:
> Find for each firm in dataset (1) another "matching" firm in dataset
> (2) that is as similar as possible to the sample firm in dataset (1)
> (based on two characteristics).
>
> Dataset 1 looks like:
>
> Company    Year       CompanySize    A ratio
> A                  2012        140                    0.2
> B                  2011        200                   0.4
> C                  2010        300                    0.2
>
> It includes many firms over a period of 20 years including their
> characteristics. There are two matching characteristics: the company
> size and a (company) ratio that I calculated.
> For example, company A has a size of 140 and a ratio of 0.2 in 2012.
> Now, I want to find a firm in dataset (2), which is similar to firm A
> in dataset (1) in the same year 2012.
>
> Dataset 2 looks very similar:
>
> Company    Year       CompanySize    A ratio
> X                  2012        150                    0.19
> Y                  2012        280                   0.9
> Z                  2012        50                      0.01
> ...
>
> Dataset (2) includes many many other firms. As mentioned, I want to
> find a matching firm for each sample firm. This should be somehow
> constructed by a loop or macro (?) I think, but I am not sure.
>
> The match should be conducted in the following way. Let's assume in
> our example that we want to find a matching firm for sample firm A in
> dataset (1).
> 1) Characteristic: CompanySize >> First matching characteristic
> Stata shall pick all firms from dataset (2) that have a company size
> between 80% and 120%  of firm A's size. All other firms in dataset (2)
> shall be immediately dismissed. This is basically the first step in
> the matching procedure.
> In our case: Company size is 140 and range 112 - 168. All firms in
> dataset (2) that have a CompanySize of above 168 or below 112 shall be
> dismissed --> Company Y and Z.
>
> 2) Characteristic: Ratio >> Second matching characteristic
> Now, Stata shall pick from the remaining firms in dataset (2) the
> single one firm which has the most similar ratio as firm A from
> dataset (1) has. In our example, this would be Company X. This should
> be done somehow like:
> Ratio firm A dataset (1) - Ratio of firm X dataset (2) = 0.2 - 0.19 = 0.01
>                                       - Ratio of firm Y = 0.4 - 0.9 = - 0.5
>                                       - Ratio of firm Z = 0.2 - 0.01 = 0.19
>>>> Pick firm X since the the difference is the smallest. Be careful here: Y and Z
> are actually already excluded due to their CompanySize (first matching
> characteristic). This
> is just an example.
>
> Finally, to make it even more complicated: I am not only looking for
> the "best" (closest) match, but also for the second and third closest
> match.
>
> In the end, I want to get one dataset that looks like this:
>
> Company        Matching Firm 1     Matching Firm 2      MF3
> A                        X                                   2nd rank
>        3rd
>
> Hopefully, I made my problem clear. Would appreciate some help. Since
> this matching
> has to be done for every sample firm, this has to be some kind of
> loop/macro that does
> this matching over and over again for every sample firm.
>
> Thanks!
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>

Prev by Date: st: how to test a published prediction model in your own data
Next by Date: Re: st: Two datasets: Look for similar observations in the second dataset
Previous by thread: st: Two datasets: Look for similar observations in the second dataset
Next by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Index(es):
- Date
- Thread