Hi Nick, I had the impression she wanted to calculate the difference between Q and T if both are avaible for a certain combination of day-time. Both Q and T are both available for 03-Jan-00 93203.

But I agree she has to resolve some duplicates issue. Are different goods sold ? Then the identifier would be day-time-good.

Sorry, you've lost me completely. There is no code here to show what you did. Perhaps it will be obvious to someone working with this kind of data who will answer. Otherwise, you may need to provide much more explanation.

In any case, this example shows that you have some duplicates with exactly the same date and time. You need to think through the implications of that for the calculations I suggested, particular if there is instantaneous variation in prices.

Nick

On Sat, Jan 29, 2011 at 5:58 PM, Beatrice Crozza <beatrice.crozza@gmail.com> wrote: > Dear Nick, > > thank you very much for your help. > Your suggestion works perfectly! > > I have another question, concerning matching quotes (Q) with trades > (T) and doing the difference (diff) between the price and the midpoint > for trades and quotes that happen at the same time. > > Here, there is an example: > > date time type midpoint price diff > 03-Jan-00 93158 Q 148.1563 . > 03-Jan-00 93158 Q 148.1719 . > 03-Jan-00 93159 Q 148.1563 . > 03-Jan-00 93200 T 148.25 . > 03-Jan-00 93201 T 148.25 . > 03-Jan-00 93202 T 148.25 . > 03-Jan-00 93203 Q 148.125 . > 03-Jan-00 93203 T 148.25 . > > Why the result is a missing value, also for the last observation? The > result should be 0.125 > What I am doing wrong? > > Thank you very much for your help. > > Bea > > 2011/1/29 Nick Cox <njcoxstata@gmail.com>: >> I disagree. >> >> Time series operators are not needed here. Other machinery will >> suffice. Indeed, time in the example data here is not regularly spaced, >> so lag operators would make some things more difficult. >> >> Nor is any looping required. >> >> -by:-, including subscripts under -by:-, remains one of the most >> underused tools in Stata. >> >> The previous price when different is found when the price changes >> >> gen previous = price[_n-1] if price != price[_n-1] >> >> That works too for previous[1], which will be price[0], namely missing. >> >> Then the previous price will remain so until the next change: >> >> replace previous = previous[_n-1] if missing(previous) >> >> This cascading to replace missings is an old Stata trick: >> >> FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >> 2/03 How can I replace missing values with previous or >> following nonmissing values? >> http://www.stata.com/support/faqs/data/missing.html >> >> That leaves a block of missings before the first price change (all >> observations if the price never changes). >> >> If this is desired, it may well be desired in a panel context: >> >> bysort id (time) : gen previous = price[_n-1] if price != price[_n-1] >> by id: replace previous = previous[_n-1] if missing(previous) >> >> If there are missings in -price-, they would be better filled in first >> in a clone of -price- using the device explained in the FAQ. >> >> Difference from previous different price is then simply >> >> gen diff = price - previous >> >> regardless of panel context. >> >> Nick >> >> On Fri, Jan 28, 2011 at 10:21 PM, Brad Wright <bradwright@unc.edu> wrote: >> >>> I don't know of a formula per se, but you should definitely take advantage >>> of Stata's time-series operators for lagging variables. >>> >>> You must first use tsset or xtset to tell Stata what format your data are >>> in, and then creating lags is very simply done by using "L.varname" for 1 >>> lag or "2L.varname" for 2 periods and so on. >>> >>> Then you can write a piece of code that will automate this for you so that >>> you can loop through all the combinations, stopping when you find the first >>> "non-matching" period. >> >> "Beatrice Crozza" >> >>>> I want to infer the trade direction with a tick test, comparing two >>>> consecutive prices. >>>> I have a problem with Stata in order to construct the lagged price. >>>> When two consecutive prices are the same, I should go back one more >>>> lag, however, often I need to compare my price with a price of three >>>> or more previous periods. >>>> >>>> Here there is an example of my dataset: >>>> >>>> date time price >>>> 03jan2000 93157 148.25 >>>> 03jan2000 93200 148.25 >>>> 03jan2000 93201 148.27 >>>> 03jan2000 93202 148.25 >>>> 03jan2000 93203 148.25 >>>> 03jan2000 93208 148.25 >>>> 03jan2000 93211 148.25 >>>> 03jan2000 93212 148.25 >>>> 03jan2000 93215 148.15625 >>>> 03jan2000 93225 148.25 >>>> >>>> if I compare price at 93212 with the previous one, it is the same and >>>> also if I compare it with the price at 93208, so I should go further >>>> until I reach the price of 93201 which is different. >>>> >>>> If I want to do the difference for two consecutive prices I can write: >>>> price[_n]-price[n-1] >>>> a lag further: >>>> price[_n]-price[_n-2] >>>> and so on >>>> >>>> However, I would like to know if there is a formula to do the >>>> difference between to prices, until I find a price which is different >>>> from the one in consideration.