Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset

From	Torsten Häberle <[email protected]>
To	[email protected]
Subject	Re: st: Two datasets: Look for similar observations in the second dataset
Date	Mon, 27 Jan 2014 09:16:14 +0100

Thank you so much, Roberto. This is definitely what I need.

There is something else: In some cases, there is no ratio available,
either for only the sample / matching firm or for both. In this case,
I want Stata to conduct the match based on the closest size. Again,
there should be 3 matches: Best, second-best, third best match based
on size. Lastly, it might also be possible that either or both of the
size variables are missing. Then Stata should indicate that no match
is possible.

How could I consider this?

Thanks again. I would be doomed without this forum.

2014-01-27 Roberto Ferrer <[email protected]>:
> Please follow Statalist policy and provide cross-references when
> posting in other forums:
> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>
> The following is one way of doing what you want. You could avoid the
> -forvalues- loop if your database is not too big, but I assume it is.
> I didn't test speed with a big data set but I hope it gets you
> started.
>
> * ----------------------- begin code -----------------------
>
> clear all
> set more off
>
> * Input fake databases (including -dum- variable)
> input str1 company year size rat
> A                  2012        140                    0.2
> B                  2011        200                   0.4
> C                  2010        300                    0.2
> D                  2010        160                    0.5
> end
>
> gen dum = 1
>
> tempfile samp
> save "`samp'"
>
> clear all
> input str4 company year size rat
> X                  2012        150                    0.19
> XX                  2012        150                    0.20
> XXX                  2012        150                    0.22
> XXXX                  2012        150                    0.195
> Y                  2010        280                   0.9
> YY                  2010        280                   0.9
> Z                  2012        50                      0.01
> ZZ                  2010        300                    0.2
> T                  2011        200                   0.95
> U                  2010        300                    0.10
> end
>
> gen dum = 1
>
> tempfile pop
> save "`pop'"
>
>
> * Main process
> tempfile result
> local lowlimit .8
> local highlimit 1.2
>
> quietly {
>     forvalues i = 1/4 { // 4 is # observations in sample file
>       use "`samp'" in `i', clear
>       rename (company year size rat) =0
>       joinby dum using "`pop'"
>       drop dum
>
>       keep if year0 == year // compare companies with same year only
>       keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>
>       gen ratdif = abs(rat0 - rat)
>       * Ties in -ratdif- are broken alphabetically by -company- name
>       isid ratdif company, sort
>       capture keep in 1/3
>
>       if (`i' == 1) save "`result'"
>       else {
>         append using "`result'"
>         save "`result'", replace
>       }
>
>     }
>
> }
>
> * Check and reshape
> use "`result'", clear
> isid company0 ratdif company, sort
> list, sepby(company0)
>
> keep company*
> list, sepby(company0)
>
> by company0: gen id = _n
> reshape wide company, i(company0) j(id)
> list, separator(0)
>
> *------------------------- end code ------------------------
>
> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
> <[email protected]> wrote:
>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>
>> 2014-01-26 daniel klein <[email protected]>:
>>> This is a tripple post (with slight variations) that has already
>>> generated two answers here
>>>
>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>
>>> http://www.stata-forum.de/post2400.html#p2400
>>>
>>>
>>> Please see the FAQ concerning cross-postings
>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>
>>>
>>> Best
>>> Daniel
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: daniel klein <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Roberto Ferrer <[email protected]>

Prev by Date: Re: st: Two datasets: Look for similar observations in the second dataset
Next by Date: st: Random draw from log normal distribution with known mean and sd
Previous by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Next by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Index(es):
- Date
- Thread