Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Questions on the rules guiding the -table- command

From	Sergiy Radyakin <[email protected]>
To	[email protected]
Subject	st: Questions on the rules guiding the -table- command
Date	Tue, 28 Aug 2012 18:59:02 -0400

Dear All, I am looking for the formal rules of inclusion of empties
into the tables
that Stata produces and I have a number of [boring] questions.

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
 -  -  -  -  -  -
 First, consider the following example:

version 12.0
clear all
webuse nlsw88

// does not include the "mining" category
table industry union,c(mean wage) row, if union==1

// does include the "mining" category
generate unionwage=wage if union==1
table industry union, c(mean unionwage) row

// does not include the "mining" category
tabulate industry union if union==1

In the results you may see that some of the tables include the "mining" category
and some don't. I would like to learn more about the rules of inclusion and the
background if possible. Also, how common is it to use the second approach
of generating a variable with missings for inapplicable cases, rather
than restricting
the sample with an if-condition?

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
 -  -  -  -  -  -

The second question is regarding the interpretation of the "Total"
line. Consider the
following example:

version 12.0
clear
webuse nlsw88
table industry, c(mean wage) row
table industry, c(mean wage) row, if industry!=1
replace industry=. if industry==1
table industry, c(mean wage) row

There are [at least] two possible interpretations of the "total" line.
1) The total line reflects the mean among all the valid observations
(for which the row
and the outcome variables are not missing).

2) The total line reflects the mean among all the observations where
outcome variable
is not missing, regardless of whether the row variable is missing or not.

Stata seems to be using the first definition. Was this choice conscious? Why?

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
 -  -  -  -  -  -

Why is the option -concise- asymmetric? (works only for rows, but not
for columns)?
Compare the output:

table industry union, c(mean unionwage) row concise
   and
table union industry, c(mean unionwage) row concise

(I know it works only for rows according to the documentation, but why?)

Also if there is any convention in reporting results shown in any of
the cases above, I
would like to get some references as well. I fully realize that "in
some cases A is
preferable, in others B is preferable". But there might be some
studies as to how
people interpret these situations without help or guidance, etc.

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
 -  -  -  -  -  -

Finally, I would like to restate the 2009 question regarding the
labels in the -table-
command (see link):

http://www.stata.com/statalist/archive/2009-08/msg00505.html

adding to the above question also why can't I force -table- to show
missing categories
of row- and column-variables?


Thank you,
     Sergiy Radyakin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: xtmepoisson postestimation
Next by Date: Re: st: have different sizes of dots in scatter plot
Previous by thread: st: have different sizes of dots in scatter plot
Next by thread: Re: st: Questions on the rules guiding the -table- command
Index(es):
- Date
- Thread