Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Re: Better merge match-algorithm?


From   "Tobias Pfaff" <[email protected]>
To   <[email protected]>
Subject   st: Re: Re: Better merge match-algorithm?
Date   Mon, 6 Aug 2007 18:03:58 +0200

Due to your hints I was able to solve the problem and the matching of the
underlying prices is now done in no time.

Michael Blasnik: The approach with -joinby- seemed to take quite a bit of
processing time as well. But I now have more insight into the differences
between -joinby- and -merge-.

Eric Zbinden: Thanks for the hint to look for -vmatch-. Interesting command,
but I didn't really see how I could use it for my problem. A link to your
frontend would be great.

Kit Baum: Although I finally did not use the nested cond()-calls, the idea
behind your code was crucial to find the solution. Excellent!

Below, I will post my final code, now without any loops.
Thanks again for your efforts. This will save us tremendous processing time
and quite a bit of hassle in our project.

Best regards,
Tobias Pfaff


CODE:
-----

local obs_ante = _N

sort product_underlying_isin product_trading_date product_trading_time

merge product_underlying_isin product_trading_date product_trading_time
using temp_underlyings.dta
drop _merge
  	
// Sorting is important!
sort product_underlying_isin product_trading_date product_trading_time	
				
replace underlying_trading_price = underlying_trading_price[_n-1] if
(underlying_trading_price == . & ///
product_underlying_isin == product_underlying_isin[_n-1] &
product_trading_date == product_trading_date[_n-1])
gen product_matched_underlying_time = product_trading_time if _merge == 2
replace product_matched_underlying_time =
product_matched_underlying_time[_n-1] if (product_matched_underlying_time ==
"" & ///
product_underlying_isin == product_underlying_isin[_n-1] &
product_trading_date == product_trading_date[_n-1])

// Drop obs from underlying dataset				
drop if _merge == 2  	
			
// check if no. of products remains the same
local obs_post = _N
assert `obs_ante' == `obs_post'
				
// Drop observations where no previous underlying price can be found
drop if underlying_trading_price == .
							


-----Urspr�ngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Michael Blasnik
Gesendet: Freitag, 3. August 2007 14:54
An: [email protected]
Betreff: st: Re: Better merge match-algorithm?

...

It looks like you don't need any looping across observations at all.
Perhaps you don't understand the -merge- or -joinby- commands?  I don't see
any variables listed in your -merge- command even though you want two of the
variables to match exactly between the datasets..  You can have -merge- (or
-joinby-) match on those two variables and then use standard Stata commands
to identify the best match according to your criteria.

I don't really follow your code fully in terms of selecting the best match
(is it just the first observation to meet the criteria?),so you might have
to tweak the details, but here's a start using -joinby- to do most of the
work (-joinby- will form all combinations in a many-to-many merge):

joinby underlying_isin underlying_trading_date using underlyings_temp

bysort underlying_isin underlying_trading_date (underlying_trading_time
option_trading_time): gen byte match= underlying_trading_time[_n+1] >
option_trading_time

(This is a total of two commands, the second line mostly likely will wrap in
email)

Michael Blasnik

.
----- Original Message -----
From: "Tobias Pfaff" <[email protected]>
To: <[email protected]>
Sent: Friday, August 03, 2007 3:54 AM
Subject: st: Better merge match-algorithm?


> Dear Statalisters,
>
> I encountered the following problem in a finance research project: two
> tables, one with option prices, the other with (underlying) stock prices.
> The task is to match the appropriate stock price to each option price
> observation. My current solution works, but seems to be inefficient due to
> tremendous processing time (> 4h).
>
> My current solution is the following (I refer to the following numbers in
> the code below):
> 1) Fetch number of observations from underlying table.
> 2) Fetch number of obs. from option table
> 3) Merge the underlying prices to the option prices (one-to-one merge)
> 4) Using two nested forvalues loops, I iterate over the option
observations
> to find an appropriate underlying price again iterating over the
underlying
> prices in the second forvalues loop. [The matching criteria are an
identical
> ISIN number, identical trading_date, and that the trading time of the
> subsequent underlying is bigger than the option trading time, i.e. looking
> for the most recent underlying price.]
>
> Before writing down my code, I would have the following questions:
>
> A) IS THERE A MORE EFFICIENT WAY TO CARRY OUT THE CONDITIONAL MATCHING
> WITHOUT HAVING TO ITERATE OVER EACH AND EVERY OBSERVATION ?
>
> B) IF NOT, IS IT POSSIBLE TO 'OUTSOURCE' THE TASK TO A MATA PROGRAM, SUCH
> THAT THE COMPILATION OF THE LOOP-CODE IS DONE ONCE INSTEAD OF A MILLION
> TIMES ?
>
> I thought about the Mata possibility when I read in a presentation by Kit
> Baum:
> "Your ado-files may perform some complicated tasks which involve many
> invocations of the same commands. Stata's ado-file language is easy to
read
> and write, but it is interpreted: Stata must evaluate each statement and
> translate it into machine code. The new Mata programming language (help
> mata) creates compiled code which can run much faster than ado-file code."
>
>
> Thanks a lot,
> Tobias
>
> University of Muenster,
> Germany
>
>
> My code:
> -------------------------------------------------
>
>
> preserve
>
> use 01_Original/dta_files/_prepared_und_`year'`month'`day', clear
> sort underlying_isin underlying_trading_date underlying_trading_time
> // cf. 1) Fetch number of observations from underlying table.
> local obs_underlyings = _N
> save underlyings_temp.dta, replace
>
> restore
> // cf. 2) Fetch number of obs. from option table
> local obs_options = _N
>
> sort option_underlying_isin option_trading_date option_trading_time
> // cf. 3) Merge the underlying prices to the option prices (one-to-one
> merge)
> merge using underlyings_temp
> drop _merge
> erase underlyings_temp.dta
>
> gen double option_matched_underlying_price = .
> gen option_matched_underlying_time = ""
>
> // cf. 4) Using two nested forvalues loops, I iterate over the option
> observations to find an appropriate
> // underlying price again iterating over the underlying prices in the
second
> forvalues loop.
> forvalues num = 1/`obs_options' {
>
> forvalues num2 = 1/`obs_underlyings' {
> if (underlying_isin[`num2'] == option_underlying_isin[`num']
> & underlying_trading_date[`num2'] == option_trading_date[`num'] & ///
> underlying_trading_time[`num2'+1] >
> option_trading_time[`num']) {
>
> quietly replace option_matched_underlying_price =
> underlying_trading_price[`num2'] in `num'
> quietly replace option_matched_underlying_time =
> underlying_trading_time[`num2'] in `num'
>
> continue, break //exit loop once a matching
> trade is found
> }
> }
> }
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index