Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Towards publication quality output


From   Marcello Pagano <Pagano@hsph.harvard.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Towards publication quality output
Date   Fri, 22 Aug 2003 08:51:50 -0400

For Fred, m.p.
____________________


During the reign of Stata 7, the most common complaint was about the absence of publication quality graphics. With Stata 8, there are often complaints about publication quality tables and regression outputs, with some people suggesting SPSS as a Stata alternative for tables.

I want to suggest to the Stata community extensions to the way Stata handles variable labels, as it I think such additions can lead to better looking tables and other outputs.

I want to suggest 3 new labels as options and additions to Stata's -variable label- Many of these comments may reflect my personal usage, but I suspect that there is a generalizability here that may be useful to all.

-label- may have many uses, only one of which may be the production of a publication quality labels. In the 2 examples below, the variables names are continued for historical compatibility reasons and the labels provide different types of information,

variable name type format variable label
-------------------------------------------------
fatigue_ float %9.0g Sx-fatigue
haq_disa float %9.0g Disability Index

Label 1 tells me that the fatigue variable came from the symptoms section and label 2 that the variable is one of many disability indexes. I cant use any of these label for publication, however. So, if I copy a table using either varname or label, I have to reformat in my word processor.

What I think is needed is a publication or "table label." This can be done, for example, by using variable characteristics:

. char list haq[tlabel]
haq_disa[tlabel]: HAQ (0-3)

. char list fatigue[tlabel]
fatigue_[tlabel]: Fatigue (0-10)

I used this in a program (on SSC) called -fsum-:

Variable | N Mean SD
----------------+---------------------------
Fatigue (0-10) | 6309 4.33 2.88 HAQ (0-3) | 6270 1.08 0.72

The output is publication ready.

These table labels, however, are not of much use for column labels in tables, as they are much too long. In Stata, the -list- command contains an option to display as a column label the text in char varname[varname]. With the help of Nick Cox, I wrote a program called -corrtab- that will be placed on the SSC Archives on Kit Baum's return next week. This program, that display correlations, is an example of the use of tlabels (table labels) and clabels (column labels) together. Here are some examples.


No labels
. corrtab haq pain glb age sleep,v(3)

Pearson correlations

+-------------------------------------------+
| Variable haq_disa pain_sca glb_seve |
|-------------------------------------------|
| haq_disa 1.000 0.609 0.591 |
| pain_sca 0.609 1.000 0.664 |
| glb_seve 0.591 0.664 1.000 |
| age 0.123 -0.042 0.028 |
| sleep_sc 0.411 0.505 0.507 |
+-------------------------------------------+

clables in columns
. corrtab haq pain glb age sleep,v(3) c

Pearson correlations

+--------------------------------------+
| Variable HAQ Pain Global |
|--------------------------------------|
| haq_disa 1.000 0.609 0.591 |
| pain_sca 0.609 1.000 0.664 |
| glb_seve 0.591 0.664 1.000 |
| age 0.123 -0.042 0.028 |
| sleep_sc 0.411 0.505 0.507 |
+--------------------------------------+

clabels in columns and rows
. corrtab haq pain glb age sleep,v(3) all

Pearson correlations

+--------------------------------------+
| Variable HAQ Pain Global |
|--------------------------------------|
| HAQ 1.000 0.609 0.591 |
| Pain 0.609 1.000 0.664 |
| Global 0.591 0.664 1.000 |
| Age 0.123 -0.042 0.028 |
| Sleep 0.411 0.505 0.507 |
+--------------------------------------+

tlabels in rows, clabels in columns
. corrtab haq pain glb age sleep,v(3) t c

Pearson correlations

+------------------------------------------------------+
| Variable HAQ Pain Global |
|------------------------------------------------------|
| HAQ (0-3) 1.000 0.609 0.591 |
| Pain (0-10) 0.609 1.000 0.664 |
| Global severity (0-10) 0.591 0.664 1.000 |
| Age (years) 0.123 -0.042 0.028 |
| Sleep disturbance (0-10) 0.411 0.505 0.507 |
+------------------------------------------------------+

Notice that the clabel is short and serves as an identifier rather than being very informative.

A third type of label is a graphics label. It usually differs from other labels for a variety of reasons.

So, I suggest the label extensions that are carried out by variable characteristics:

tlabel
clabel
glabel

The labels suggested are useful for people who repeatedly work with the same set of variables. In addition, they give control over the output. They would not replace Stata variable labels, but would be extensions.

If Stata were to adopt extensions like these, it might be an additional step toward better output throughout its many programs. I could see them being used in various regression commands.

I don't know if Statalisters think this is a good idea, but if they do it might be useful to develop a consensus regarding what kind of extensions there should be and into which char they should be placed.

Perhaps these comments might stimulate discussion on the issues of publication quality output and how it might be accomplished.

Fred Wolfe


Fred Wolfe National Data Bank for Rheumatic Diseases Wichita, Kansas Tel (316) 263-2125 Fax (316) 263-0761 fwolfe@arthritis-research.org

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index