Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: descripive stats on %tc formatted variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: descripive stats on %tc formatted variables
Date   Thu, 28 Jun 2012 01:40:07 +0100

Also, your sample results imply a very skewed distribution, so mean,
SD, min and max need to be supplemented with more summary statistics.

On Thu, Jun 28, 2012 at 1:28 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> Format is a matter of how numbers are displayed, but the complaint
> about format is undeserved:
>
> 1. The format is not biting here, so much as the magnitudes you have,
> given the units being used.
>
> 2. The relationship between a variable's display format and how
> numbers are displayed by -summarize- is at best indirect. In this case
> the display format %tc is not being used at all by -summarize-. If it
> were then the mean would be displayed otherwise as is shown by
>
> . di %tc 889268.9
> 01jan1960 00:14:49
>
> which is not what you are seeing (and would be even less help).
>
> #1 is the main point. A clock time is expressed in milliseconds, so
> the numbers are right, as for example 3 hours is 1,080,000 ms.
>
> . di 3 * 60 * 60 * 100
> 1080000
>
> You don't say how you would prefer the numbers to be displayed, but
> suppose that minutes are what you want. Then
>
> gen double q27_min = q27 / 60000
>
> Now try -summarize q27_min- to see results in minutes. If you want
> hours, you need a different divisor.
>
> Your idea of a different format won't help much or at all here, as #1
> and #2 imply. Also, it is best not to think of assigning a different
> -format- as converting a variable, as the values stored remain the
> same: all you change is how they are displayed, but even that is not
> directly relevant in this case.
>
> To sum up: As far as Stata is concerned, you are getting what you
> asked for, results in milliseconds. But all you need to do is change
> the units. However, that is nothing to do with -format- in Stata's
> sense.
>
> Nick
>
> On Thu, Jun 28, 2012 at 12:00 AM, Kerry MacQuarrie
> <kerry10@u.washington.edu> wrote:
>
>> I am struggling to run the most basic summary statistics on selected
>> variables in my dataset because they are formatted as %tc (aka clock) data.
>> For example, a certain variable for waiting time to see a provider is in the
>> format HH:MM:SS, with a range of 1 minute to 5 hours.  The seconds are
>> always zero (i.e. always ending in :00) as the times were reported in
>> minutes with much heaping at :05, :10, :30, and :00 minutes as one might
>> expect in self-reported data.
>>
>> I simply want to run some summary statistics such as the mean/median, range,
>> quintiles, etc.  But I’m tripped up by the formatting.  A straightforward
>> command like sum varname returns this non-intuitive output:
>>
>>    Variable |       Obs        Mean    Std. Dev.       Min        Max
>> -------------+--------------------------------------------------------
>>         q27 |       766    889268.9     1644010          0   1.80e+07
>>
>> Do I need to convert the variable into a different format?  Are there
>> commands to produce the types of summary statistics I’m looking for that are
>> specific to %tc variables?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index