Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Torsten Häberle <[email protected]> |

To |
[email protected] |

Subject |
Re: st: Two datasets: Look for similar observations in the second dataset |

Date |
Mon, 27 Jan 2014 22:05:37 +0100 |

Sorry, I have to answer again. I kind of solved the problem with the missing ratios. I found a way with the if/else command to match based on the closest size if the ratios are missing. However, I couldn't figure out a solution to problem (2), namely: different sample firms can be matched to the same matching firm. To make my matching perfect, it would be great if the loop could be extended in the following way. - If a sample firm B is matched to a matching firm A in year X (2000), then drop out the matching firm A from the universe of all matching firms for the years X (2000), X+1 (2001), X+2 (2002), X+3 (2003), X-1 (1999), X-2 (1998), X-3 (1997). - Basically, this means that matching firm A could be matched again with another sample firm, but only in OTHER years than those outlined above in the example. - For example, if there is another sample firm in 2007, then this sample firm could be matched again with our matching firm A in year 2007. However, if there would be a sample firm in 2002, matching firm A could NOT be the matching firm again, because it was already matched to sample firm B in 2000. - In summary, if a matching firm was matched with a sample firm, it cannot be a match again in the three years before and the three years after it was matched the first time. But it can be another match in all other years. If there would be a second match, again, this second "7-year period" would be locked again. Sorry, this is an even more complex extension. Thanks again so much. 2014-01-27 Roberto Ferrer <[email protected]>: > Please follow Statalist policy and provide cross-references when > posting in other forums: > http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting > > The following is one way of doing what you want. You could avoid the > -forvalues- loop if your database is not too big, but I assume it is. > I didn't test speed with a big data set but I hope it gets you > started. > > * ----------------------- begin code ----------------------- > > clear all > set more off > > * Input fake databases (including -dum- variable) > input str1 company year size rat > A 2012 140 0.2 > B 2011 200 0.4 > C 2010 300 0.2 > D 2010 160 0.5 > end > > gen dum = 1 > > tempfile samp > save "`samp'" > > clear all > input str4 company year size rat > X 2012 150 0.19 > XX 2012 150 0.20 > XXX 2012 150 0.22 > XXXX 2012 150 0.195 > Y 2010 280 0.9 > YY 2010 280 0.9 > Z 2012 50 0.01 > ZZ 2010 300 0.2 > T 2011 200 0.95 > U 2010 300 0.10 > end > > gen dum = 1 > > tempfile pop > save "`pop'" > > > * Main process > tempfile result > local lowlimit .8 > local highlimit 1.2 > > quietly { > forvalues i = 1/4 { // 4 is # observations in sample file > use "`samp'" in `i', clear > rename (company year size rat) =0 > joinby dum using "`pop'" > drop dum > > keep if year0 == year // compare companies with same year only > keep if inrange(size, `lowlimit'*size0, `highlimit'*size0) > > gen ratdif = abs(rat0 - rat) > * Ties in -ratdif- are broken alphabetically by -company- name > isid ratdif company, sort > capture keep in 1/3 > > if (`i' == 1) save "`result'" > else { > append using "`result'" > save "`result'", replace > } > > } > > } > > * Check and reshape > use "`result'", clear > isid company0 ratdif company, sort > list, sepby(company0) > > keep company* > list, sepby(company0) > > by company0: gen id = _n > reshape wide company, i(company0) j(id) > list, separator(0) > > *------------------------- end code ------------------------ > > On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle > <[email protected]> wrote: >> Sorry guys. Just wanted to get different opinions since it's a tough one. >> >> 2014-01-26 daniel klein <[email protected]>: >>> This is a tripple post (with slight variations) that has already >>> generated two answers here >>> >>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset >>> >>> http://www.stata-forum.de/post2400.html#p2400 >>> >>> >>> Please see the FAQ concerning cross-postings >>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting) >>> >>> >>> Best >>> Daniel >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Two datasets: Look for similar observations in the second dataset***From:*Roberto Ferrer <[email protected]>

**Re: st: Two datasets: Look for similar observations in the second dataset***From:*Amadou DIALLO <[email protected]>

**References**:**Re: st: Two datasets: Look for similar observations in the second dataset***From:*daniel klein <[email protected]>

**Re: st: Two datasets: Look for similar observations in the second dataset***From:*Torsten Häberle <[email protected]>

**Re: st: Two datasets: Look for similar observations in the second dataset***From:*Roberto Ferrer <[email protected]>

- Prev by Date:
**st: Stata 10 and Mac OS X Mavericks** - Next by Date:
**Re: st: Stata 10 and Mac OS X Mavericks** - Previous by thread:
**Re: st: Two datasets: Look for similar observations in the second dataset** - Next by thread:
**Re: st: Two datasets: Look for similar observations in the second dataset** - Index(es):