Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Dates, strings, formats & labels: tab / tabstat / table


From   "Allan Reese (Cefas)" <[email protected]>
To   <[email protected]>
Subject   st: Dates, strings, formats & labels: tab / tabstat / table
Date   Mon, 13 Sep 2010 12:30:18 +0100

This is the type of behaviour that puts people off computers and causes
them to think folk who promote Stata are nerds - maybe I am for thinking
it worth writing up ;-)

The objective: summarize a data table as monthly statistics - in time
order.  Usually the data comes as a spreadsheet with the date as a
string. I always recommend writing the month as a name to avoid
ambiguity of day and month.

Today's detonator: having converted the "month year" string to a date,
the following commands gave irritatingly inconsistent interpretations.  

* Here's the input string variable, which sorts alphabetically
. tab month

      month |      Freq.     Percent        Cum.
------------+-----------------------------------
     APR 10 |         38        6.92        6.92
     AUG 09 |         37        6.74       13.66
     DEC 09 |         38        6.92       20.58
     FEB 10 |         38        6.92       27.50
     JAN 10 |         37        6.74       34.24
     JUL 09 |         38        6.92       41.17
     JUL 10 |         37        6.74       47.91
     JUN 09 |         37        6.74       54.64
     JUN 10 |         34        6.19       60.84
     MAR 10 |         39        7.10       67.94
     MAY 09 |         32        5.83       73.77
     MAY 10 |         38        6.92       80.69
     NOV 09 |         30        5.46       86.16
     OCT 09 |         37        6.74       92.90
     SEP 09 |         39        7.10      100.00
------------+-----------------------------------
      Total |        549      100.00

* -gen- a date and format it %tdMY

. desc date

              storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------
-----------------------------------------------
date            float  %tdMY                  

* -tab- insists it was the 1st of each month
. tab date

       date |      Freq.     Percent        Cum.
------------+-----------------------------------
  01may2009 |         32        5.83        5.83
  01jun2009 |         37        6.74       12.57
  01jul2009 |         38        6.92       19.49
  01aug2009 |         37        6.74       26.23
  01sep2009 |         39        7.10       33.33
  01oct2009 |         37        6.74       40.07
  01nov2009 |         30        5.46       45.54
  01dec2009 |         38        6.92       52.46
  01jan2010 |         37        6.74       59.20
  01feb2010 |         38        6.92       66.12
  01mar2010 |         39        7.10       73.22
  01apr2010 |         38        6.92       80.15
  01may2010 |         38        6.92       87.07
  01jun2010 |         34        6.19       93.26
  01jul2010 |         37        6.74      100.00
------------+-----------------------------------
      Total |        549      100.00

* though -table- can handle the concept ...
. table date

------------------------
       date |      Freq.
------------+-----------
      May09 |         32
     June09 |         37
     July09 |         38
   August09 |         37
September09 |         39
  October09 |         37
 November09 |         30
 December09 |         38
  January10 |         37
 February10 |         38
    March10 |         39
    April10 |         38
      May10 |         38
     June10 |         34
     July10 |         37
------------------------ 

* but -table- will handle only five statistics at a time, and I wanted
six.  So tried -tabstat- and got a shock
. tabstat row row ,  stat(mean) by(date)

Summary statistics: mean
  by categories of: date 

     date |       row       row
----------+--------------------
    18018 |      16.5      16.5
    18049 |        54        54
    18079 |      94.5      94.5
    18110 |       135       135
    18141 |       176       176
    18171 |       217       217
    18202 |     253.5     253.5
    18232 |     290.5     290.5
    18263 |       331       331
    18294 |     371.5     371.5
    18322 |       413       413
    18353 |     454.5     454.5
    18383 |     494.5     494.5
    18414 |     532.5     532.5
    18444 |       570       570
----------+--------------------
    Total |  295.7687  295.7687
-------------------------------

* Why have I asked for the same variable in two columns?  Because when I
simplified the table to send the report ...
. tabstat row ,  stat(mean) by(date)

Summary for variables: row
     by categories of: date 

     date |      mean
----------+----------
      May09 |      16.5
     June09 |        54
     July09 |      94.5
   August09 |       135
September09 |       176
  October09 |       217
 November09 |     253.5
 December09 |     290.5
  January10 |       331
 February10 |     371.5
    March10 |       413
    April10 |     454.5
      May10 |     494.5
     June10 |     532.5
     July10 |       570
----------+----------
    Total |  295.7687
---------------------

With one statistic the -by- was output formatted, but I could not get
the header and total lines to align even using -labelwidth- and
-longstub-.

Having found -tabstat- deformatted the stub, another approach was to
-egen- the group(date) values to give monthindex numbers, but they need
the month-year labels.  The only way to generate labels from an existing
variable appears to be -encode-, but that sorts the values
alphabetically.  

I typed in the labels -label define month 1 "May 09" etc- and attached
those to the index.

Like the bank says, "There must be a better way".


R Allan Reese
Senior statistician, Cefas
The Nothe, Weymouth DT4 8UB 

Tel: +44 (0)1305 206614 -direct
Fax: +44 (0)1305 206601 

www.cefas.co.uk 




***********************************************************************************
This email and any attachments are intended for the named recipient only.  Its unauthorised use, distribution, disclosure, storage or copying is not permitted.  If you have received it in error, please destroy all copies and notify the sender.  In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent.  All emails may be subject to monitoring.
***********************************************************************************


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index