[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: High-frequency time-series (stata 10)

From   "Eva Poen" <>
Subject   Re: st: High-frequency time-series (stata 10)
Date   Wed, 24 Sep 2008 16:55:44 +0100


so what you are saying is that you want to treat _simultaneous_
observations as if they occured at different points in time? To me it
seems questionable whether this is valid or not. You have a clear
ordering of bids and/or asks _between_ timedates, but within? You will
have to give Stata an idea of how close the duplicate observations
occured to each other, and in which order they occured. For any time
series analysis the sequence of events is going to matter a lot
(that's the whole point of doing time series analysis).

Is it important for you retain the original distance in your new time
variable? I.e. does it make a difference to you, conceptually, to know
that observation one was five minutes earlier than observation two,
and observation two was one minute earlier than observation three? Or
is it enough for you to know that observation one came first,
observation two came next, and observation three after that?

If you want to keep the original time distances intact, I don't see
much hope for you. You'd have to introduce time intervals shorter than
5 minutes for your duplicates, but that would mean that you end up
with a lot of gaps in your data (because the time unit for your
analysis would now be one minute, but most observations are five
minutes apart).

The only way to retain all data _and_ at the same time avoid creating
gaps is to create a hypothetical new time variable which simply
reflects the sequence of bids and asks over time, in steps of one.
This will make all your observations one time unit apart from the
next, and therefore distort the original time dimension. I'm not sure
this is what you want. In Stata, this is easy to achieve: you simply
sort your data such that they are in order of the true sequence, and
then do something like

gen newtime = _n

Still, the problem persists that you have to know the true sequence,
unless all bids and asks at a specific timedate are identical.

If you don't have a clue about the true sequence, and your
(simultaneous) bids and asks are not identical, you could do random
draws such that only one of the duplicates ends up in the analysis. If
you do the -tsset- after the random draw, Stata should not complain
about repeated time values.

Hope this helps,

2008/9/24 Beatrice Crozza <>:
> Dear All,
> I know that with Stata 10 all time-series analysis commands now
> support data with frequencies as high as 1 millisecond.
> I would like to treat my dataset as a time-series,I have
> high-frequency data(5-minute intervals) and a variable (timedate) that
> is composed of the time and the date for each bid and ask. My problem
> is that I have duplicate data, because for each timedate there could
> be more than one bid and more than one ask, so Stata gave me this
> error message:
> . tsset timedate, delta(5 minute)
> repeated time values in sample
> r(451);
> I need to treat my date as a time-series to do an AR analysis for bids
> and asks through time.
> I am thinking to assign a number ( in sequential order) to each
> timedate and than to use the command tsset with this new variable.
> Do you think that this procedure is right to achieve my goal?
> Do you have any idea on how to assign a sequential number to each bid
> and ask, without using Excel?
> Thank you very much in advance.
> Best,
> Bea
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index