Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: descripive stats on %tc formatted variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: descripive stats on %tc formatted variables
Date   Thu, 28 Jun 2012 01:28:16 +0100

Format is a matter of how numbers are displayed, but the complaint
about format is undeserved:

1. The format is not biting here, so much as the magnitudes you have,
given the units being used.

2. The relationship between a variable's display format and how
numbers are displayed by -summarize- is at best indirect. In this case
the display format %tc is not being used at all by -summarize-. If it
were then the mean would be displayed otherwise as is shown by

. di %tc 889268.9
01jan1960 00:14:49

which is not what you are seeing (and would be even less help).

#1 is the main point. A clock time is expressed in milliseconds, so
the numbers are right, as for example 3 hours is 1,080,000 ms.

. di 3 * 60 * 60 * 100
1080000

You don't say how you would prefer the numbers to be displayed, but
suppose that minutes are what you want. Then

gen double q27_min = q27 / 60000

Now try -summarize q27_min- to see results in minutes. If you want
hours, you need a different divisor.

Your idea of a different format won't help much or at all here, as #1
and #2 imply. Also, it is best not to think of assigning a different
-format- as converting a variable, as the values stored remain the
same: all you change is how they are displayed, but even that is not
directly relevant in this case.

To sum up: As far as Stata is concerned, you are getting what you
asked for, results in milliseconds. But all you need to do is change
the units. However, that is nothing to do with -format- in Stata's
sense.

Nick

On Thu, Jun 28, 2012 at 12:00 AM, Kerry MacQuarrie
<kerry10@u.washington.edu> wrote:

> I am struggling to run the most basic summary statistics on selected
> variables in my dataset because they are formatted as %tc (aka clock) data.
> For example, a certain variable for waiting time to see a provider is in the
> format HH:MM:SS, with a range of 1 minute to 5 hours.  The seconds are
> always zero (i.e. always ending in :00) as the times were reported in
> minutes with much heaping at :05, :10, :30, and :00 minutes as one might
> expect in self-reported data.
>
> I simply want to run some summary statistics such as the mean/median, range,
> quintiles, etc.  But I’m tripped up by the formatting.  A straightforward
> command like sum varname returns this non-intuitive output:
>
>    Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>         q27 |       766    889268.9     1644010          0   1.80e+07
>
> Do I need to convert the variable into a different format?  Are there
> commands to produce the types of summary statistics I’m looking for that are
> specific to %tc variables?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index