Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: converting high frequency data to low frequency


From   David Kantor <[email protected]>
To   [email protected]
Subject   RE: st: converting high frequency data to low frequency
Date   Fri, 05 Nov 2010 10:25:37 -0400

Thank you to Nick for the correction and for bringing me up-to-date.
--David

At 07:59 AM 11/5/2010, you wrote:
David's suggestion strikes me as right in principle, but I think he's still thinking in terms of the bad old days before Stata 10 when people had to work out their own awkward ways of handling times of day. That's a misunderstanding here.

As always, the _format_ of these data is a matter of how they are to be displayed, and not a matter of how they are stored. (An article on the most common misunderstandings of Stata would surely include this one.)

Dimitry's data look exactly like standard Stata date-times, allowed in Stata 10 up, meaning that underneath the cosmetic format they are times in milliseconds (ms). Therefore, he wants to round in units of 1000 * 60 * 5 = 300000.

Here is a concrete example which covers everything needed to understand this problem.

Using a %tc format for a -clock()- conversion of 11:31:00 today gives us back, not surprisingly, the same information:

. di %tc  clock("5 Nov 2010 11:31:00", "DMYhms")
05nov2010 11:31:00

But underneath all that, the precise date-time _really_ is just an integer with units ms.

. di %20.0f  clock("5 Nov 2010 11:31:00", "DMYhms")
       1604575860000

(The "20" in the format is much more than I need but causes no problem here.)

You can round down or round up; which way you go is a matter of taste or convention. I almost never round using -int()-. I almost always round using -floor()- or -ceil()- because then I know immediately that I am rounding down (-floor()-) or up (-ceil()-; think ceiling) and I don't get bit around 0 because the way -int()- works with negative numbers is not what I usually want, except that I might forget that or not foresee it might happen with my data.

Now rounding down, for example, in units of 5 minutes is rounding down in units of 300000 ms. There are three steps, except that they can be combined in one line:

1. Divide by 300000.

2. Round down to the next integer below.

3. Multiply by 300000.

So, the result is another large integer,

. di %20.0f  300000 * floor(clock("5 Nov 2010 11:31:00", "DMYhms")/300000)
       1604575800000

But we should check that we did it right:

. di %tc  300000 * floor(clock("5 Nov 2010 11:31:00", "DMYhms")/300000)
05nov2010 11:30:00

With a variable it's going to be

gen double binnedtime = 300000 * floor(ordertime/300000)
format binnedtime %tc

Never forget the -double-. Then you can -collapse- (or better -contract-) in terms of the new variable. (If it's really just time of day you care about, you must get there first by subtraction.)

(I suggested generalising -floor()- and -ceil()- some years ago to StataCorp so that with two arguments -floor(ordertime, 300000), say, would do what is above, but the suggestion is still lurking in their files. A good argument against would be that the long-winded way to do it, as above, is easy enough.)

See also if desired

SJ-3-4  dm0002  . . . . . . . . Stata tip 2: Building with floors and ceilings
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/03 SJ 3(4):446--447 (no commands)
        tips for using floor() and ceil()


Nick
[email protected]
[...]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index