Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Two datasets: Look for similar observations in the second dataset
From
Torsten Häberle <[email protected]>
To
[email protected]
Subject
Re: st: Two datasets: Look for similar observations in the second dataset
Date
Mon, 27 Jan 2014 09:16:14 +0100
Thank you so much, Roberto. This is definitely what I need.
There is something else: In some cases, there is no ratio available,
either for only the sample / matching firm or for both. In this case,
I want Stata to conduct the match based on the closest size. Again,
there should be 3 matches: Best, second-best, third best match based
on size. Lastly, it might also be possible that either or both of the
size variables are missing. Then Stata should indicate that no match
is possible.
How could I consider this?
Thanks again. I would be doomed without this forum.
2014-01-27 Roberto Ferrer <[email protected]>:
> Please follow Statalist policy and provide cross-references when
> posting in other forums:
> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>
> The following is one way of doing what you want. You could avoid the
> -forvalues- loop if your database is not too big, but I assume it is.
> I didn't test speed with a big data set but I hope it gets you
> started.
>
> * ----------------------- begin code -----------------------
>
> clear all
> set more off
>
> * Input fake databases (including -dum- variable)
> input str1 company year size rat
> A 2012 140 0.2
> B 2011 200 0.4
> C 2010 300 0.2
> D 2010 160 0.5
> end
>
> gen dum = 1
>
> tempfile samp
> save "`samp'"
>
> clear all
> input str4 company year size rat
> X 2012 150 0.19
> XX 2012 150 0.20
> XXX 2012 150 0.22
> XXXX 2012 150 0.195
> Y 2010 280 0.9
> YY 2010 280 0.9
> Z 2012 50 0.01
> ZZ 2010 300 0.2
> T 2011 200 0.95
> U 2010 300 0.10
> end
>
> gen dum = 1
>
> tempfile pop
> save "`pop'"
>
>
> * Main process
> tempfile result
> local lowlimit .8
> local highlimit 1.2
>
> quietly {
> forvalues i = 1/4 { // 4 is # observations in sample file
> use "`samp'" in `i', clear
> rename (company year size rat) =0
> joinby dum using "`pop'"
> drop dum
>
> keep if year0 == year // compare companies with same year only
> keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>
> gen ratdif = abs(rat0 - rat)
> * Ties in -ratdif- are broken alphabetically by -company- name
> isid ratdif company, sort
> capture keep in 1/3
>
> if (`i' == 1) save "`result'"
> else {
> append using "`result'"
> save "`result'", replace
> }
>
> }
>
> }
>
> * Check and reshape
> use "`result'", clear
> isid company0 ratdif company, sort
> list, sepby(company0)
>
> keep company*
> list, sepby(company0)
>
> by company0: gen id = _n
> reshape wide company, i(company0) j(id)
> list, separator(0)
>
> *------------------------- end code ------------------------
>
> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
> <[email protected]> wrote:
>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>
>> 2014-01-26 daniel klein <[email protected]>:
>>> This is a tripple post (with slight variations) that has already
>>> generated two answers here
>>>
>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>
>>> http://www.stata-forum.de/post2400.html#p2400
>>>
>>>
>>> Please see the FAQ concerning cross-postings
>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>
>>>
>>> Best
>>> Daniel
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/