Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Better merge match-algorithm?


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: Better merge match-algorithm?
Date   Fri, 03 Aug 2007 08:54:11 -0400

...

It looks like you don't need any looping across observations at all. Perhaps you don't understand the -merge- or -joinby- commands? I don't see any variables listed in your -merge- command even though you want two of the variables to match exactly between the datasets.. You can have -merge- (or -joinby-) match on those two variables and then use standard Stata commands to identify the best match according to your criteria.

I don't really follow your code fully in terms of selecting the best match (is it just the first observation to meet the criteria?),so you might have to tweak the details, but here's a start using -joinby- to do most of the work (-joinby- will form all combinations in a many-to-many merge):

joinby underlying_isin underlying_trading_date using underlyings_temp

bysort underlying_isin underlying_trading_date (underlying_trading_time option_trading_time): gen byte match= underlying_trading_time[_n+1] >
option_trading_time

(This is a total of two commands, the second line mostly likely will wrap in email)

Michael Blasnik

.
----- Original Message ----- From: "Tobias Pfaff" <tobiaspfaff@uni-muenster.de>
To: <statalist@hsphsun2.harvard.edu>
Sent: Friday, August 03, 2007 3:54 AM
Subject: st: Better merge match-algorithm?



Dear Statalisters,

I encountered the following problem in a finance research project: two
tables, one with option prices, the other with (underlying) stock prices.
The task is to match the appropriate stock price to each option price
observation. My current solution works, but seems to be inefficient due to
tremendous processing time (> 4h).

My current solution is the following (I refer to the following numbers in
the code below):
1) Fetch number of observations from underlying table.
2) Fetch number of obs. from option table
3) Merge the underlying prices to the option prices (one-to-one merge)
4) Using two nested forvalues loops, I iterate over the option observations
to find an appropriate underlying price again iterating over the underlying
prices in the second forvalues loop. [The matching criteria are an identical
ISIN number, identical trading_date, and that the trading time of the
subsequent underlying is bigger than the option trading time, i.e. looking
for the most recent underlying price.]

Before writing down my code, I would have the following questions:

A) IS THERE A MORE EFFICIENT WAY TO CARRY OUT THE CONDITIONAL MATCHING
WITHOUT HAVING TO ITERATE OVER EACH AND EVERY OBSERVATION ?

B) IF NOT, IS IT POSSIBLE TO 'OUTSOURCE' THE TASK TO A MATA PROGRAM, SUCH
THAT THE COMPILATION OF THE LOOP-CODE IS DONE ONCE INSTEAD OF A MILLION
TIMES ?

I thought about the Mata possibility when I read in a presentation by Kit
Baum:
"Your ado-files may perform some complicated tasks which involve many
invocations of the same commands. Stata's ado-file language is easy to read
and write, but it is interpreted: Stata must evaluate each statement and
translate it into machine code. The new Mata programming language (help
mata) creates compiled code which can run much faster than ado-file code."


Thanks a lot,
Tobias

University of Muenster,
Germany


My code:
-------------------------------------------------


preserve

use 01_Original/dta_files/_prepared_und_`year'`month'`day', clear
sort underlying_isin underlying_trading_date underlying_trading_time
// cf. 1) Fetch number of observations from underlying table.
local obs_underlyings = _N
save underlyings_temp.dta, replace

restore
// cf. 2) Fetch number of obs. from option table
local obs_options = _N

sort option_underlying_isin option_trading_date option_trading_time
// cf. 3) Merge the underlying prices to the option prices (one-to-one
merge)
merge using underlyings_temp
drop _merge
erase underlyings_temp.dta

gen double option_matched_underlying_price = .
gen option_matched_underlying_time = ""

// cf. 4) Using two nested forvalues loops, I iterate over the option
observations to find an appropriate
// underlying price again iterating over the underlying prices in the second
forvalues loop.
forvalues num = 1/`obs_options' {

forvalues num2 = 1/`obs_underlyings' {
if (underlying_isin[`num2'] == option_underlying_isin[`num']
& underlying_trading_date[`num2'] == option_trading_date[`num'] & ///
underlying_trading_time[`num2'+1] >
option_trading_time[`num']) {

quietly replace option_matched_underlying_price =
underlying_trading_price[`num2'] in `num'
quietly replace option_matched_underlying_time =
underlying_trading_time[`num2'] in `num'

continue, break //exit loop once a matching
trade is found
}
}
}

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index