[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: combining frequency tables

From   "Nick Cox" <>
To   <>
Subject   RE: st: RE: combining frequency tables
Date   Wed, 25 Feb 2009 11:16:23 -0000

I wasn't paying attention to the implications of the fact that you were
doing this under -by:-, but a very little experimentation and analysis
is enough to clarify what is happening. 

Your speculation is the wrong way round. It's not a question of what
-matrix- will "accept"; if there were a problem with that there would be
an error message. It's a question of what is left behind earlier that
-matrix- can get hold of. 

Forget Rose Ann Medeiros' -mdesc- and my hack -mmdesc- and focus on
something accessible to all. 

. sysuse auto
(1978 Automobile Data)

. bysort foreign : su mpg

-> foreign = Domestic

    Variable |       Obs        Mean    Std. Dev.       Min        Max
         mpg |        52    19.82692    4.743297         12         34

-> foreign = Foreign

    Variable |       Obs        Mean    Std. Dev.       Min        Max
         mpg |        22    24.77273    6.611187         14         41

. ret li

                  r(N) =  22
              r(sum_w) =  22
               r(mean) =  24.77272727272727
                r(Var) =  43.70779220779221
                 r(sd) =  6.611186898567625
                r(min) =  14
                r(max) =  41
                r(sum) =  545

This is typical (although not universal) of how -by:- works with r-class
results. What happens is that 

1. -summarize- does the work for Domestic cars, showing the results and
leaving them in r-class results, such as r(N), r(mean), etc. 
2. -summarize- does the work for Foreign cars, showing the results and
leaving them in r-class results, such as r(N), r(mean), etc. 

What's left behind -- and what you can access therefore -- is just the
_last_ set of r-class results. As it goes through the subsets, Stata
overwrites each set of r-class results with the next. This is generic of
byable commands (in Stata's jargon) written as -recall- commands not
-onecall- commands. 

Look at the code for -mdesc- and -mmdesc- and you will see that they are
-recall- commands. 

What's to be done? Here are three solutions; there are probably others

1. Rewrite -mdesc- or -mmdesc- to be -onecall- commands. That's a job
for a Stata programmer. 

2. Use -statsby-, say with -mdesc-. (If you don't know about -statsby-,
find out anyway.) 

3. Call -mmdesc- in a loop and accumulate results. There's more on
technique at 

FAQ     . . . . . . . . . . Making foreach go through all values of a
        8/05    Is there a way to tell Stata to try all values of a
                particular variable in a foreach statement without
                specifying them?

This could be something like 

egen which = group(treatment timpoint), label 
su which, meanonly 
local ng = r(max) 

forval i = 1/`ng' { 
	di "`: label (group) `i''" 
	mmdesc use1 use2  
	matrix all = nullmat(all) \ r(table) 

I am not paying attention to your extra -if- conditions which impart
another level of complexity. You would need to call -mmdesc- for each
distinct set of conditions. After each call to -mmdesc- you would need
to copy the resulting matrix before it was overwritten. 


Shehzad Ali

Thanks, Nick. 

For some reason, I am having no joy using the -mat- option. This may be 
because I am using -by- option with -mmdes- and -mat- is not willing to 
accept (at least in my case) multiple rows that represented different
categories. I am still getting a single row matrix even for the first
before I join two matrices.

I think I will do this in Excel.

On Feb 23 2009, Nick Cox wrote:

>-findit mdesc- reveals that it is
>a program by Rose Anne Medeiros, who once wrote a program called -njc-,
>for which I am not responsible. Here is a matrix version of her
>for which she is not responsible. 
>*! mmdesc 1.0 NJC 23 February 2009 
>* mdesc 1.0 RAM 18 JULY 2008
>* Returns a table with number missing, total, and missing/total
>program mmdesc, rclass byable(recall)
>version 8 
>syntax [varlist] [if] [in]
>tempvar touse
>mark `touse' `if' `in'
>local nvars : word count `varlist' 
>tempname matrix 
>matrix `matrix' = J(`nvars', 3, .) 
>local i = 1 
>quietly foreach var of local varlist {
>	count if missing(`var') & `touse' 
>	matrix `matrix'[`i', 1] = r(N) 
>	count if `touse'
>	matrix `matrix'[`i', 2] = r(N) 
>	matrix `matrix'[`i', 3] = `matrix'[`i',1] / `matrix'[`i',2] 
>	local ++i  
>matrix rownames `matrix' = `varlist'                     
>matrix colnames `matrix' = Missing Total Missing/Total 
>matrix list `matrix', noheader 
>return matrix table = `matrix' 
>Once you run this, you can save its results by e.g. 
>mat A = r(table) 
>mat B = r(table) 
>mat C = A + B 
>Shehzad Ali
>Another (perhaps) simple question: is it possible to add different 
>frequency tables together to form a combined single table in Excel?
>I have two group variables (say, var1: type of treatment; and var2:
>point of observation) and 5 use variables. So an observation would be
>treatment A at time point 1. I want to know the number of missing
>for each use variable for each treatment group and each time point.
>for each use variable there are 'if' conditions, so I can't do the 
>frequency count in one go. Here is what I am doing using a user written

>programme -mdesc- or -missing-:
>by treatment timpoint: mdesc use1 if (condition1 condition2)
>by treatment timpoint: mdesc use2 if (condition3 condition4)
>and so on. 
>Now I want to combine all these counts for all timepoints and treatment

>groups to get a single table that I can export to Excel. Is there any
>way I 
>could do that?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index