Nick Cox <njcoxstata@gmail.com>

statalist@hsphsun2.harvard.edu

Subject |
Re: st: descripive stats on %tc formatted variables

Date |
Thu, 28 Jun 2012 01:28:16 +0100

Format is a matter of how numbers are displayed, but the complaint about format is undeserved: 1. The format is not biting here, so much as the magnitudes you have, given the units being used. 2. The relationship between a variable's display format and how numbers are displayed by -summarize- is at best indirect. In this case the display format %tc is not being used at all by -summarize-. If it were then the mean would be displayed otherwise as is shown by . di %tc 889268.9 01jan1960 00:14:49 which is not what you are seeing (and would be even less help). #1 is the main point. A clock time is expressed in milliseconds, so the numbers are right, as for example 3 hours is 1,080,000 ms. . di 3 * 60 * 60 * 100 1080000 You don't say how you would prefer the numbers to be displayed, but suppose that minutes are what you want. Then gen double q27_min = q27 / 60000 Now try -summarize q27_min- to see results in minutes. If you want hours, you need a different divisor. Your idea of a different format won't help much or at all here, as #1 and #2 imply. Also, it is best not to think of assigning a different -format- as converting a variable, as the values stored remain the same: all you change is how they are displayed, but even that is not directly relevant in this case. To sum up: As far as Stata is concerned, you are getting what you asked for, results in milliseconds. But all you need to do is change the units. However, that is nothing to do with -format- in Stata's sense. Nick On Thu, Jun 28, 2012 at 12:00 AM, Kerry MacQuarrie <kerry10@u.washington.edu> wrote: > I am struggling to run the most basic summary statistics on selected > variables in my dataset because they are formatted as %tc (aka clock) data. > For example, a certain variable for waiting time to see a provider is in the > format HH:MM:SS, with a range of 1 minute to 5 hours. The seconds are > always zero (i.e. always ending in :00) as the times were reported in > minutes with much heaping at :05, :10, :30, and :00 minutes as one might > expect in self-reported data. > > I simply want to run some summary statistics such as the mean/median, range, > quintiles, etc. But I’m tripped up by the formatting. A straightforward > command like sum varname returns this non-intuitive output: > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > q27 | 766 889268.9 1644010 0 1.80e+07 > > Do I need to convert the variable into a different format? Are there > commands to produce the types of summary statistics I’m looking for that are > specific to %tc variables? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

