Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: converting high frequency data to low frequency |

Date |
Fri, 5 Nov 2010 11:59:13 +0000 |

David's suggestion strikes me as right in principle, but I think he's still thinking in terms of the bad old days before Stata 10 when people had to work out their own awkward ways of handling times of day. That's a misunderstanding here. As always, the _format_ of these data is a matter of how they are to be displayed, and not a matter of how they are stored. (An article on the most common misunderstandings of Stata would surely include this one.) Dimitry's data look exactly like standard Stata date-times, allowed in Stata 10 up, meaning that underneath the cosmetic format they are times in milliseconds (ms). Therefore, he wants to round in units of 1000 * 60 * 5 = 300000. Here is a concrete example which covers everything needed to understand this problem. Using a %tc format for a -clock()- conversion of 11:31:00 today gives us back, not surprisingly, the same information: . di %tc clock("5 Nov 2010 11:31:00", "DMYhms") 05nov2010 11:31:00 But underneath all that, the precise date-time _really_ is just an integer with units ms. . di %20.0f clock("5 Nov 2010 11:31:00", "DMYhms") 1604575860000 (The "20" in the format is much more than I need but causes no problem here.) You can round down or round up; which way you go is a matter of taste or convention. I almost never round using -int()-. I almost always round using -floor()- or -ceil()- because then I know immediately that I am rounding down (-floor()-) or up (-ceil()-; think ceiling) and I don't get bit around 0 because the way -int()- works with negative numbers is not what I usually want, except that I might forget that or not foresee it might happen with my data. Now rounding down, for example, in units of 5 minutes is rounding down in units of 300000 ms. There are three steps, except that they can be combined in one line: 1. Divide by 300000. 2. Round down to the next integer below. 3. Multiply by 300000. So, the result is another large integer, . di %20.0f 300000 * floor(clock("5 Nov 2010 11:31:00", "DMYhms")/300000) 1604575800000 But we should check that we did it right: . di %tc 300000 * floor(clock("5 Nov 2010 11:31:00", "DMYhms")/300000) 05nov2010 11:30:00 With a variable it's going to be gen double binnedtime = 300000 * floor(ordertime/300000) format binnedtime %tc Never forget the -double-. Then you can -collapse- (or better -contract-) in terms of the new variable. (If it's really just time of day you care about, you must get there first by subtraction.) (I suggested generalising -floor()- and -ceil()- some years ago to StataCorp so that with two arguments -floor(ordertime, 300000), say, would do what is above, but the suggestion is still lurking in their files. A good argument against would be that the long-winded way to do it, as above, is easy enough.) See also if desired SJ-3-4 dm0002 . . . . . . . . Stata tip 2: Building with floors and ceilings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/03 SJ 3(4):446--447 (no commands) tips for using floor() and ceil() Nick n.j.cox@durham.ac.uk David Kantor ============ I believe you need to do just what you stated. In particular, if h, m , and s represent the hours, minutes and seconds, then you want gen long s2 = 3600 * h + 60 * m + s then take int(s2/300) (You may want to s2 to include date*24*3600 -- if you don't use the date as a separate component of the identifier.) There may be other methods available that I'm not aware of. Dimitry Masterov ================ >I have data on orders that looks like this: > >store_id ordertime >1901 10oct2010 15:48:18 >1901 10oct2010 15:50:36 >1901 10oct2010 16:41:35 >1901 10oct2010 16:53:37 >1901 10oct2010 18:38:24 >1901 10oct2010 19:09:22 >1901 10oct2010 21:14:22 >..... >1906 10oct2010 17:54:59 >1906 12oct2010 19:21:27 >1906 17oct2010 16:10:01 >1906 17oct2010 19:34:39 >1906 17oct2010 21:11:00 >1906 17oct2010 22:18:55 >1906 17oct2010 22:55:13 >1906 19oct2010 22:23:10 >1906 20oct2010 01:19:24 > >Ordertime is in %tc format. I would like to count the number of orders >in 5 minute buckets by store_id. I would like to use > >collapse (count) ordertime, by(store_id bucket) > >but I have no idea how to group the ordertime into 5 minute intervals >that are nicely labeled. I imagine I could calculate the elapsed time >since midnight for each day, divide that into 5 minute intervals (or >6000 unit intervals), and find some way to label, but I have not >gotten this to work yet. Is there's an easier way? I tried googling >and findit-ing, but I am not sure I know what this type of >transformation is called. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: converting high frequency data to low frequency***From:*"Dimitriy V. Masterov" <dvmaster@gmail.com>

**Re: st: converting high frequency data to low frequency***From:*David Kantor <kantor.d@att.net>

- Prev by Date:
**st: merge, append or which one?** - Next by Date:
**st: Updating IDs in a foreach loop** - Previous by thread:
**Re: st: converting high frequency data to low frequency** - Next by thread:
**RE: st: converting high frequency data to low frequency** - Index(es):