[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Time without date |

Date |
Wed, 24 Oct 2007 10:50:44 -0500 |

Mai Mai <mai7777@gmail.com> writes, > I have a variable that is %tc, eg: (02jan2005 10:13:43) how can make > this variable composed of time only without the date. I don't want it > to include the standard date of 1960. I am doing this because I need > to average across time of many days second by second. You *DO* want it to include the standard date of 1960 because, the way %tc values work, that is the same as not having a date. %tc values record the number of milliseconds from 01jan1960 00:00:00.000. If you have just pure times, think of the %tc convention as being the number of milliseconds from the beginning of the day and ignore that fact that the particular day turns out to be 01jan1960. Consider averaging the times 8:00 a.m., 1:00 p.m., and 2:00 p.m. Before trying the Stata solution, let's just think about the problem logically. We could average 8, 13, and 14 to get 11.666667, and then we could turn that back into a clock time of 11:40 because .66667 hours is 40 minutes. Let's see if Stata, with its millisecond from 1960 logic, gets the same answer: . clear all . input str40 time time 1. "8:00 am" 2. "1:00 pm" 3. "2:00 pm" 4. end . gen double t = clock(time, "hm") . format t %tc . list +------------------------------+ | time t | |------------------------------| 1. | 8:00 am 01jan1960 08:00:00 | 2. | 1:00 pm 01jan1960 13:00:00 | 3. | 2:00 pm 01jan1960 14:00:00 | +------------------------------+ . gen double avg = sum(t)/sum(t<.) . format avg %tc . list avg in l +--------------------+ | avg | |--------------------| 3. | 01jan1960 11:40:00 | +--------------------+ We got the right answer if we just ignore the 01jan1960 part. Well, the 01jan1960 part is not really there. In this case, t records the number of milliseconds from the start of the day and it is a matter of INTERPRETATION by Stata that the particular day is 01jan1960. In fact, the times I entered might be from data that said an admission happened on 8:00 am on 20oct2007, at 1:00 pm on 21oct2007, and at 2:00 pm on 22oct2007. In fact, let's make precisely that assumption and start all over again, and deal with the problem of the times starting on different days: . clear . input str40 time time 1. "20oct2007 8:00 am" 2. "21oct2007 1:00 pm" 3. "22oct2007 2:00 pm" 4. end . gen double t = clock(time, "DMY hm") . format t %tc . list +----------------------------------------+ | time t | |----------------------------------------| 1. | 20oct2007 8:00 am 20oct2007 08:00:00 | 2. | 21oct2007 1:00 pm 21oct2007 13:00:00 | 3. | 22oct2007 2:00 pm 22oct2007 14:00:00 | +----------------------------------------+ Okay, that's my data. Notice t contains the same times as previously, but on different days. Let's make a pure time from it. I'll do that by creating a new %tc variable containing the hours and minutes from t, but using 01jan1960 as the particular day: . gen double puret = mdyhms(1,1,1960, hh(t), mm(t), 0) . format puret %tc . list puret +--------------------+ | puret | |--------------------| 1. | 01jan1960 08:00:00 | 2. | 01jan1960 13:00:00 | 3. | 01jan1960 14:00:00 | +--------------------+ Wait, you say, I see what you did, but that's a time on 01jan1960. No, I reply, that's a pure time, but you are right, by the Stata convention, there is no difference between a pure time and a time on 01jan1960. Anyway, let's format puret so that we no longer see the Stata convention: . format puret %tcHH:MM . list puret +-------+ | puret | |-------| 1. | 08:00 | 2. | 13:00 | 3. | 14:00 | +-------+ Now, let's get our average: . gen double avg = sum(puret)/sum(puret<.) . format avg %tcHH:MM:SS . list avg in l +----------+ | avg | |----------| 3. | 11:40:00 | +----------+ The fact is that I am making too much of the Stata convention because, if you go back and look at this final problem, you will realize that there is nothing magic about 01jan1960 and in fact I could have used any day as my base date, such as 01jan1970, and then ignored that. I would have gotten the same results. Using 01jan1960, however, is a little better because then the numeric values stored in t are the number of milliseconds since the beginning of the day rather than the number of milliseconds from the bigging of the day, plus a constant. You might want to use the milliseconds-into-day variable directly. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Difference-in-difference regression models** - Next by Date:
**Re: st: Table of results** - Previous by thread:
**st: RE: Time without date** - Next by thread:
**st: WCSUG meeting is ON** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |