Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset


From   Roberto Ferrer <[email protected]>
To   Stata Help <[email protected]>
Subject   Re: st: Two datasets: Look for similar observations in the second dataset
Date   Tue, 28 Jan 2014 18:32:28 -0430

I'm afraid I can't help you with your modified problem, but maybe some
other user will.

Let me comment briefly that your problem reminds me of one posted on
this list some weeks ago:

http://www.stata.com/statalist/archive/2014-01/msg00190.html

As stated, your problem has multiple solutions because the order in
which you match the firms will affect the remaining possible matches.
If you cannot justify the order in which you do the matches, you could
get in trouble.


On Mon, Jan 27, 2014 at 4:35 PM, Torsten Häberle
<[email protected]> wrote:
> Sorry, I have to answer again. I kind of solved the problem with the
> missing ratios. I found a way with the if/else command to match based
> on the closest size if the ratios are missing.
>
> However, I couldn't figure out a solution to problem (2), namely:
> different sample firms can be matched to the same matching firm. To
> make my matching perfect, it would be great if the loop could be
> extended in the following way.
>
> - If a sample firm B is matched to a matching firm A in year X (2000),
> then drop out the matching firm A from the universe of all matching
> firms for the years X (2000), X+1 (2001), X+2 (2002), X+3 (2003), X-1
> (1999), X-2 (1998), X-3 (1997).
> - Basically, this means that matching firm A could be matched again
> with another sample firm, but only in OTHER years than those outlined
> above in the example.
> - For example, if there is another sample firm in 2007, then this
> sample firm could be matched again with our matching firm A in year
> 2007. However, if there would be a sample firm in 2002, matching firm
> A could NOT be the matching firm again, because it was already matched
> to sample firm B in 2000.
> - In summary, if a matching firm was matched with a sample firm, it
> cannot be a match again in the three years before and the three years
> after it was matched the first time. But it can be another match in
> all other years. If there would be a second match, again, this second
> "7-year period" would be locked again.
>
> Sorry, this is an even more complex extension.
>
> Thanks again so much.
>
> 2014-01-27 Roberto Ferrer <[email protected]>:
>> Please follow Statalist policy and provide cross-references when
>> posting in other forums:
>> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>>
>> The following is one way of doing what you want. You could avoid the
>> -forvalues- loop if your database is not too big, but I assume it is.
>> I didn't test speed with a big data set but I hope it gets you
>> started.
>>
>> * ----------------------- begin code -----------------------
>>
>> clear all
>> set more off
>>
>> * Input fake databases (including -dum- variable)
>> input str1 company year size rat
>> A                  2012        140                    0.2
>> B                  2011        200                   0.4
>> C                  2010        300                    0.2
>> D                  2010        160                    0.5
>> end
>>
>> gen dum = 1
>>
>> tempfile samp
>> save "`samp'"
>>
>> clear all
>> input str4 company year size rat
>> X                  2012        150                    0.19
>> XX                  2012        150                    0.20
>> XXX                  2012        150                    0.22
>> XXXX                  2012        150                    0.195
>> Y                  2010        280                   0.9
>> YY                  2010        280                   0.9
>> Z                  2012        50                      0.01
>> ZZ                  2010        300                    0.2
>> T                  2011        200                   0.95
>> U                  2010        300                    0.10
>> end
>>
>> gen dum = 1
>>
>> tempfile pop
>> save "`pop'"
>>
>>
>> * Main process
>> tempfile result
>> local lowlimit .8
>> local highlimit 1.2
>>
>> quietly {
>>     forvalues i = 1/4 { // 4 is # observations in sample file
>>       use "`samp'" in `i', clear
>>       rename (company year size rat) =0
>>       joinby dum using "`pop'"
>>       drop dum
>>
>>       keep if year0 == year // compare companies with same year only
>>       keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>>
>>       gen ratdif = abs(rat0 - rat)
>>       * Ties in -ratdif- are broken alphabetically by -company- name
>>       isid ratdif company, sort
>>       capture keep in 1/3
>>
>>       if (`i' == 1) save "`result'"
>>       else {
>>         append using "`result'"
>>         save "`result'", replace
>>       }
>>
>>     }
>>
>> }
>>
>> * Check and reshape
>> use "`result'", clear
>> isid company0 ratdif company, sort
>> list, sepby(company0)
>>
>> keep company*
>> list, sepby(company0)
>>
>> by company0: gen id = _n
>> reshape wide company, i(company0) j(id)
>> list, separator(0)
>>
>> *------------------------- end code ------------------------
>>
>> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
>> <[email protected]> wrote:
>>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>>
>>> 2014-01-26 daniel klein <[email protected]>:
>>>> This is a tripple post (with slight variations) that has already
>>>> generated two answers here
>>>>
>>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>>
>>>> http://www.stata-forum.de/post2400.html#p2400
>>>>
>>>>
>>>> Please see the FAQ concerning cross-postings
>>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>>
>>>>
>>>> Best
>>>> Daniel
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index