Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Two datasets: Look for similar observations in the second dataset
From
Torsten Häberle <[email protected]>
To
[email protected]
Subject
Re: st: Two datasets: Look for similar observations in the second dataset
Date
Mon, 27 Jan 2014 21:09:58 +0100
There is something else I realized. The problem I have now is that
different sample firms can have the same matching firm. Example:
Sample Firm A --------- Matching Firm B
Sample Firm C --------- Matching Firm D
Sample Firm E --------- Matching Firm B <<<----- Firm B is matched again!
Stata should remove Matching Firm B after it got matched with Sample
Firm A, so that no other Sample Firm can be matched with Matching Firm
B again!
Tricky stuff...
Thanks
2014-01-27 Roberto Ferrer <[email protected]>:
> Please follow Statalist policy and provide cross-references when
> posting in other forums:
> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>
> The following is one way of doing what you want. You could avoid the
> -forvalues- loop if your database is not too big, but I assume it is.
> I didn't test speed with a big data set but I hope it gets you
> started.
>
> * ----------------------- begin code -----------------------
>
> clear all
> set more off
>
> * Input fake databases (including -dum- variable)
> input str1 company year size rat
> A 2012 140 0.2
> B 2011 200 0.4
> C 2010 300 0.2
> D 2010 160 0.5
> end
>
> gen dum = 1
>
> tempfile samp
> save "`samp'"
>
> clear all
> input str4 company year size rat
> X 2012 150 0.19
> XX 2012 150 0.20
> XXX 2012 150 0.22
> XXXX 2012 150 0.195
> Y 2010 280 0.9
> YY 2010 280 0.9
> Z 2012 50 0.01
> ZZ 2010 300 0.2
> T 2011 200 0.95
> U 2010 300 0.10
> end
>
> gen dum = 1
>
> tempfile pop
> save "`pop'"
>
>
> * Main process
> tempfile result
> local lowlimit .8
> local highlimit 1.2
>
> quietly {
> forvalues i = 1/4 { // 4 is # observations in sample file
> use "`samp'" in `i', clear
> rename (company year size rat) =0
> joinby dum using "`pop'"
> drop dum
>
> keep if year0 == year // compare companies with same year only
> keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>
> gen ratdif = abs(rat0 - rat)
> * Ties in -ratdif- are broken alphabetically by -company- name
> isid ratdif company, sort
> capture keep in 1/3
>
> if (`i' == 1) save "`result'"
> else {
> append using "`result'"
> save "`result'", replace
> }
>
> }
>
> }
>
> * Check and reshape
> use "`result'", clear
> isid company0 ratdif company, sort
> list, sepby(company0)
>
> keep company*
> list, sepby(company0)
>
> by company0: gen id = _n
> reshape wide company, i(company0) j(id)
> list, separator(0)
>
> *------------------------- end code ------------------------
>
> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
> <[email protected]> wrote:
>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>
>> 2014-01-26 daniel klein <[email protected]>:
>>> This is a tripple post (with slight variations) that has already
>>> generated two answers here
>>>
>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>
>>> http://www.stata-forum.de/post2400.html#p2400
>>>
>>>
>>> Please see the FAQ concerning cross-postings
>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>
>>>
>>> Best
>>> Daniel
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/