Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

st: Questions on the rules guiding the -table- command

 From Sergiy Radyakin To statalist@hsphsun2.harvard.edu Subject st: Questions on the rules guiding the -table- command Date Tue, 28 Aug 2012 18:59:02 -0400

```Dear All, I am looking for the formal rules of inclusion of empties
into the tables
that Stata produces and I have a number of [boring] questions.

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -  -  -  -  -
First, consider the following example:

version 12.0
clear all
webuse nlsw88

// does not include the "mining" category
table industry union,c(mean wage) row, if union==1

// does include the "mining" category
generate unionwage=wage if union==1
table industry union, c(mean unionwage) row

// does not include the "mining" category
tabulate industry union if union==1

In the results you may see that some of the tables include the "mining" category
background if possible. Also, how common is it to use the second approach
of generating a variable with missings for inapplicable cases, rather
than restricting
the sample with an if-condition?

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -  -  -  -  -

The second question is regarding the interpretation of the "Total"
line. Consider the
following example:

version 12.0
clear
webuse nlsw88
table industry, c(mean wage) row
table industry, c(mean wage) row, if industry!=1
replace industry=. if industry==1
table industry, c(mean wage) row

There are [at least] two possible interpretations of the "total" line.
1) The total line reflects the mean among all the valid observations
(for which the row
and the outcome variables are not missing).

2) The total line reflects the mean among all the observations where
outcome variable
is not missing, regardless of whether the row variable is missing or not.

Stata seems to be using the first definition. Was this choice conscious? Why?

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -  -  -  -  -

Why is the option -concise- asymmetric? (works only for rows, but not
for columns)?
Compare the output:

table industry union, c(mean unionwage) row concise
and
table union industry, c(mean unionwage) row concise

(I know it works only for rows according to the documentation, but why?)

Also if there is any convention in reporting results shown in any of
the cases above, I
would like to get some references as well. I fully realize that "in
some cases A is
preferable, in others B is preferable". But there might be some
studies as to how
people interpret these situations without help or guidance, etc.

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -  -  -  -  -

Finally, I would like to restate the 2009 question regarding the
labels in the -table-

http://www.stata.com/statalist/archive/2009-08/msg00505.html

adding to the above question also why can't I force -table- to show
missing categories
of row- and column-variables?

Thank you,