Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Help with data management


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Help with data management
Date   Sun, 13 Jul 2008 17:15:03 +0100

. 

Katia Bobulova asked three questions. My answers below. 

Nick 
n.j.cox@durham.ac.uk 

I have some problems in managing my dataset. I have a dataset with
high frequency data, with observations about time, date, price,
quantity and a code for the type of security.

Question 1
==========
I have the time in this format: 63000 and i want to construct a
"timedate" variable, which contains both the data and the time.

First af all I tried to modify the format of the time typing:

g hours=int(time/10000)
g minutes=int((time-hours*10000)/100)
g seconds=int(time-hours*10000-minutes*100)
g newtime=hms(hours, minutes, seconds)
format newtime%tc

However, after typing hms() I receive this message:
Unknown function hms()
r(133);

The next step would be typing something like:

gen double timedate=date*24*60*60*1000+time
format timedate %tcNN/DD/CCYY_HH:MM:SS

Answer 1
========

-hms()- is an -egen- function written by Kit Baum and is included in the
-egenmore- package from SSC. It will _only_ work with -egen-, not with
-generate-. 

But you don't need it. You did almost all the work yourself. 

Assuming that e.g. 63000 is 06:30:00 then given your variables -hours-,
-minutes-, -seconds-, 

gen long time_in_sec = 3600 * hours + 60 * minutes + seconds 

Question 2
==========

I have two types of prices: price1 and price2. I would like to
create a variable which takes the difference between the lowest value
of price1 and the highest value of price2 for each data and time.

First of all I sorted my dataset:

sort date time price1 price2

then I generated the new variable:

gen price3=(price1[_n]-price2[_N] & date==(26feb2008) & time==63000

But as a result I have all missing values, furthermore I have  to do
this for each date and time, so i was wondering if there is a way to
instruct Stata to create this new variable for each time and data.

Answer 2
========

I don't understand how you reached your solution, nor why it produces
missing values.
I don't think your -sort- order guarantees what you want, but it looks
quite wrong any way. 

This is likely to be closer to where you want to be. 

bysort date time (price1) : gen diff = price1[1] 
bysort date time (price2) : replace diff = price2[_N] - diff 


Question 3
==========

I have to divide my variable time in equal time intervals. For
example, i have observations at 14:41:38 and 15:28:32 and I would like
to have observations at precise time intervals, for example each
5-minute, i.e. at 14:40:00, 14:45:00 and so on. Any idea on how to do
this?

Answer 3
========

Create another dataset with regularly spaced observations and then
-merge-. 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index