Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Time without date


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Time without date
Date   Wed, 24 Oct 2007 10:50:44 -0500

Mai Mai <mai7777@gmail.com> writes, 

> I have a variable that is %tc, eg: (02jan2005 10:13:43) how can make
> this variable composed of time only without the date. I don't want it
> to include the standard date of 1960. I am doing this because I need
> to average across time of many days second by second.

You *DO* want it to include the standard date of 1960 because, the way 
%tc values work, that is the same as not having a date.

%tc values record the number of milliseconds from 01jan1960 00:00:00.000.  If
you have just pure times, think of the %tc convention as being the number of
milliseconds from the beginning of the day and ignore that fact that the
particular day turns out to be 01jan1960.

Consider averaging the times 8:00 a.m., 1:00 p.m., and 2:00 p.m.  Before
trying the Stata solution, let's just think about the problem logically.  We
could average 8, 13, and 14 to get 11.666667, and then we could turn that back
into a clock time of 11:40 because .66667 hours is 40 minutes.

Let's see if Stata, with its millisecond from 1960 logic, gets the same
answer:

        . clear all

        . input str40 time 

                                         time
          1. "8:00 am"
          2. "1:00 pm"
          3. "2:00 pm"
          4. end

        . gen double t = clock(time, "hm")

        . format t %tc

        . list

             +------------------------------+
             |    time                    t |
             |------------------------------|
          1. | 8:00 am   01jan1960 08:00:00 |
          2. | 1:00 pm   01jan1960 13:00:00 |
          3. | 2:00 pm   01jan1960 14:00:00 |
             +------------------------------+

        . gen double avg = sum(t)/sum(t<.)

        . format avg %tc

        . list avg in l 

             +--------------------+
             |                avg |
             |--------------------|
          3. | 01jan1960 11:40:00 |
             +--------------------+

We got the right answer if we just ignore the 01jan1960 part.  Well, the
01jan1960 part is not really there.  In this case, t records the number of
milliseconds from the start of the day and it is a matter of INTERPRETATION by
Stata that the particular day is 01jan1960.  In fact, the times I entered
might be from data that said an admission happened on 8:00 am on 20oct2007, at
1:00 pm on 21oct2007, and at 2:00 pm on 22oct2007.

In fact, let's make precisely that assumption and start all over again, 
and deal with the problem of the times starting on different days:

        . clear 

        . input str40 time 

                                         time
          1. "20oct2007 8:00 am"
          2. "21oct2007 1:00 pm"
          3. "22oct2007 2:00 pm"
          4. end

        . gen double t = clock(time, "DMY hm")

        . format t %tc

        . list

             +----------------------------------------+
             |              time                    t |
             |----------------------------------------|
          1. | 20oct2007 8:00 am   20oct2007 08:00:00 |
          2. | 21oct2007 1:00 pm   21oct2007 13:00:00 |
          3. | 22oct2007 2:00 pm   22oct2007 14:00:00 |
             +----------------------------------------+

Okay, that's my data.  Notice t contains the same times as previously, but on
different days.  Let's make a pure time from it.  I'll do that by creating 
a new %tc variable containing the hours and minutes from t, but using 
01jan1960 as the particular day:

        . gen double puret = mdyhms(1,1,1960, hh(t), mm(t), 0)

        . format puret %tc

        . list puret

             +--------------------+
             |              puret |
             |--------------------|
          1. | 01jan1960 08:00:00 |
          2. | 01jan1960 13:00:00 |
          3. | 01jan1960 14:00:00 |
             +--------------------+

Wait, you say, I see what you did, but that's a time on 01jan1960.
No, I reply, that's a pure time, but you are right, by the Stata 
convention, there is no difference between a pure time and a time on 
01jan1960.

Anyway, let's format puret so that we no longer see the Stata convention:

        . format puret %tcHH:MM

        . list puret 

             +-------+
             | puret |
             |-------|
          1. | 08:00 |
          2. | 13:00 |
          3. | 14:00 |
             +-------+

Now, let's get our average:

        . gen double avg = sum(puret)/sum(puret<.)

        . format avg %tcHH:MM:SS

        . list avg in l 

             +----------+
             |      avg |
             |----------|
          3. | 11:40:00 |
             +----------+

The fact is that I am making too much of the Stata convention because, 
if you go back and look at this final problem, you will realize that 
there is nothing magic about 01jan1960 and in fact I could have used
any day as my base date, such as 01jan1970, and then ignored that.
I would have gotten the same results.  Using 01jan1960, however, is a little
better because then the numeric values stored in t are the number
of milliseconds since the beginning of the day rather than the number 
of milliseconds from the bigging of the day, plus a constant.
You might want to use the milliseconds-into-day variable directly.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index