Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset


From   Torsten Häberle <[email protected]>
To   [email protected]
Subject   Re: st: Two datasets: Look for similar observations in the second dataset
Date   Mon, 27 Jan 2014 22:05:37 +0100

Sorry, I have to answer again. I kind of solved the problem with the
missing ratios. I found a way with the if/else command to match based
on the closest size if the ratios are missing.

However, I couldn't figure out a solution to problem (2), namely:
different sample firms can be matched to the same matching firm. To
make my matching perfect, it would be great if the loop could be
extended in the following way.

- If a sample firm B is matched to a matching firm A in year X (2000),
then drop out the matching firm A from the universe of all matching
firms for the years X (2000), X+1 (2001), X+2 (2002), X+3 (2003), X-1
(1999), X-2 (1998), X-3 (1997).
- Basically, this means that matching firm A could be matched again
with another sample firm, but only in OTHER years than those outlined
above in the example.
- For example, if there is another sample firm in 2007, then this
sample firm could be matched again with our matching firm A in year
2007. However, if there would be a sample firm in 2002, matching firm
A could NOT be the matching firm again, because it was already matched
to sample firm B in 2000.
- In summary, if a matching firm was matched with a sample firm, it
cannot be a match again in the three years before and the three years
after it was matched the first time. But it can be another match in
all other years. If there would be a second match, again, this second
"7-year period" would be locked again.

Sorry, this is an even more complex extension.

Thanks again so much.

2014-01-27 Roberto Ferrer <[email protected]>:
> Please follow Statalist policy and provide cross-references when
> posting in other forums:
> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>
> The following is one way of doing what you want. You could avoid the
> -forvalues- loop if your database is not too big, but I assume it is.
> I didn't test speed with a big data set but I hope it gets you
> started.
>
> * ----------------------- begin code -----------------------
>
> clear all
> set more off
>
> * Input fake databases (including -dum- variable)
> input str1 company year size rat
> A                  2012        140                    0.2
> B                  2011        200                   0.4
> C                  2010        300                    0.2
> D                  2010        160                    0.5
> end
>
> gen dum = 1
>
> tempfile samp
> save "`samp'"
>
> clear all
> input str4 company year size rat
> X                  2012        150                    0.19
> XX                  2012        150                    0.20
> XXX                  2012        150                    0.22
> XXXX                  2012        150                    0.195
> Y                  2010        280                   0.9
> YY                  2010        280                   0.9
> Z                  2012        50                      0.01
> ZZ                  2010        300                    0.2
> T                  2011        200                   0.95
> U                  2010        300                    0.10
> end
>
> gen dum = 1
>
> tempfile pop
> save "`pop'"
>
>
> * Main process
> tempfile result
> local lowlimit .8
> local highlimit 1.2
>
> quietly {
>     forvalues i = 1/4 { // 4 is # observations in sample file
>       use "`samp'" in `i', clear
>       rename (company year size rat) =0
>       joinby dum using "`pop'"
>       drop dum
>
>       keep if year0 == year // compare companies with same year only
>       keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>
>       gen ratdif = abs(rat0 - rat)
>       * Ties in -ratdif- are broken alphabetically by -company- name
>       isid ratdif company, sort
>       capture keep in 1/3
>
>       if (`i' == 1) save "`result'"
>       else {
>         append using "`result'"
>         save "`result'", replace
>       }
>
>     }
>
> }
>
> * Check and reshape
> use "`result'", clear
> isid company0 ratdif company, sort
> list, sepby(company0)
>
> keep company*
> list, sepby(company0)
>
> by company0: gen id = _n
> reshape wide company, i(company0) j(id)
> list, separator(0)
>
> *------------------------- end code ------------------------
>
> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
> <[email protected]> wrote:
>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>
>> 2014-01-26 daniel klein <[email protected]>:
>>> This is a tripple post (with slight variations) that has already
>>> generated two answers here
>>>
>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>
>>> http://www.stata-forum.de/post2400.html#p2400
>>>
>>>
>>> Please see the FAQ concerning cross-postings
>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>
>>>
>>> Best
>>> Daniel
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index