Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset

From	Torsten Häberle <[email protected]>
To	[email protected]
Subject	Re: st: Two datasets: Look for similar observations in the second dataset
Date	Mon, 27 Jan 2014 21:09:58 +0100

There is something else I realized. The problem I have now is that
different sample firms can have the same matching firm. Example:
Sample Firm A --------- Matching Firm B
Sample Firm C --------- Matching Firm D
Sample Firm E --------- Matching Firm B  <<<----- Firm B is matched again!
Stata should remove Matching Firm B after it got matched with Sample
Firm A, so that no other Sample Firm can be matched with Matching Firm
B again!

Tricky stuff...
Thanks

2014-01-27 Roberto Ferrer <[email protected]>:
> Please follow Statalist policy and provide cross-references when
> posting in other forums:
> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>
> The following is one way of doing what you want. You could avoid the
> -forvalues- loop if your database is not too big, but I assume it is.
> I didn't test speed with a big data set but I hope it gets you
> started.
>
> * ----------------------- begin code -----------------------
>
> clear all
> set more off
>
> * Input fake databases (including -dum- variable)
> input str1 company year size rat
> A                  2012        140                    0.2
> B                  2011        200                   0.4
> C                  2010        300                    0.2
> D                  2010        160                    0.5
> end
>
> gen dum = 1
>
> tempfile samp
> save "`samp'"
>
> clear all
> input str4 company year size rat
> X                  2012        150                    0.19
> XX                  2012        150                    0.20
> XXX                  2012        150                    0.22
> XXXX                  2012        150                    0.195
> Y                  2010        280                   0.9
> YY                  2010        280                   0.9
> Z                  2012        50                      0.01
> ZZ                  2010        300                    0.2
> T                  2011        200                   0.95
> U                  2010        300                    0.10
> end
>
> gen dum = 1
>
> tempfile pop
> save "`pop'"
>
>
> * Main process
> tempfile result
> local lowlimit .8
> local highlimit 1.2
>
> quietly {
>     forvalues i = 1/4 { // 4 is # observations in sample file
>       use "`samp'" in `i', clear
>       rename (company year size rat) =0
>       joinby dum using "`pop'"
>       drop dum
>
>       keep if year0 == year // compare companies with same year only
>       keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>
>       gen ratdif = abs(rat0 - rat)
>       * Ties in -ratdif- are broken alphabetically by -company- name
>       isid ratdif company, sort
>       capture keep in 1/3
>
>       if (`i' == 1) save "`result'"
>       else {
>         append using "`result'"
>         save "`result'", replace
>       }
>
>     }
>
> }
>
> * Check and reshape
> use "`result'", clear
> isid company0 ratdif company, sort
> list, sepby(company0)
>
> keep company*
> list, sepby(company0)
>
> by company0: gen id = _n
> reshape wide company, i(company0) j(id)
> list, separator(0)
>
> *------------------------- end code ------------------------
>
> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
> <[email protected]> wrote:
>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>
>> 2014-01-26 daniel klein <[email protected]>:
>>> This is a tripple post (with slight variations) that has already
>>> generated two answers here
>>>
>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>
>>> http://www.stata-forum.de/post2400.html#p2400
>>>
>>>
>>> Please see the FAQ concerning cross-postings
>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>
>>>
>>> Best
>>> Daniel
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: daniel klein <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Roberto Ferrer <[email protected]>

Prev by Date: Re: st: how to merge datasets
Next by Date: st: Stata 10 and Mac OS X Mavericks
Previous by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Next by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Index(es):
- Date
- Thread