Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: combining frequency tables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: combining frequency tables
Date   Wed, 25 Feb 2009 11:16:23 -0000

I wasn't paying attention to the implications of the fact that you were
doing this under -by:-, but a very little experimentation and analysis
is enough to clarify what is happening. 

Your speculation is the wrong way round. It's not a question of what
-matrix- will "accept"; if there were a problem with that there would be
an error message. It's a question of what is left behind earlier that
-matrix- can get hold of. 

Forget Rose Ann Medeiros' -mdesc- and my hack -mmdesc- and focus on
something accessible to all. 

. sysuse auto
(1978 Automobile Data)

. bysort foreign : su mpg

------------------------------------------------------------------------
---------
-> foreign = Domestic

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         mpg |        52    19.82692    4.743297         12         34

------------------------------------------------------------------------
-------
-> foreign = Foreign

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         mpg |        22    24.77273    6.611187         14         41


. ret li

scalars:
                  r(N) =  22
              r(sum_w) =  22
               r(mean) =  24.77272727272727
                r(Var) =  43.70779220779221
                 r(sd) =  6.611186898567625
                r(min) =  14
                r(max) =  41
                r(sum) =  545

This is typical (although not universal) of how -by:- works with r-class
results. What happens is that 

1. -summarize- does the work for Domestic cars, showing the results and
leaving them in r-class results, such as r(N), r(mean), etc. 
 
2. -summarize- does the work for Foreign cars, showing the results and
leaving them in r-class results, such as r(N), r(mean), etc. 

What's left behind -- and what you can access therefore -- is just the
_last_ set of r-class results. As it goes through the subsets, Stata
overwrites each set of r-class results with the next. This is generic of
byable commands (in Stata's jargon) written as -recall- commands not
-onecall- commands. 

Look at the code for -mdesc- and -mmdesc- and you will see that they are
-recall- commands. 

What's to be done? Here are three solutions; there are probably others
too. 

1. Rewrite -mdesc- or -mmdesc- to be -onecall- commands. That's a job
for a Stata programmer. 

2. Use -statsby-, say with -mdesc-. (If you don't know about -statsby-,
find out anyway.) 

3. Call -mmdesc- in a loop and accumulate results. There's more on
technique at 

FAQ     . . . . . . . . . . Making foreach go through all values of a
variable
        8/05    Is there a way to tell Stata to try all values of a
                particular variable in a foreach statement without
                specifying them?
                http://www.stata.com/support/faqs/data/foreach.html

This could be something like 

egen which = group(treatment timpoint), label 
su which, meanonly 
local ng = r(max) 

forval i = 1/`ng' { 
	di "`: label (group) `i''" 
	mmdesc use1 use2  
	matrix all = nullmat(all) \ r(table) 
} 

I am not paying attention to your extra -if- conditions which impart
another level of complexity. You would need to call -mmdesc- for each
distinct set of conditions. After each call to -mmdesc- you would need
to copy the resulting matrix before it was overwritten. 

Nick 
n.j.cox@durham.ac.uk 

Shehzad Ali

Thanks, Nick. 

For some reason, I am having no joy using the -mat- option. This may be 
because I am using -by- option with -mmdes- and -mat- is not willing to 
accept (at least in my case) multiple rows that represented different
-by- 
categories. I am still getting a single row matrix even for the first
step 
before I join two matrices.

I think I will do this in Excel.

On Feb 23 2009, Nick Cox wrote:

>-findit mdesc- reveals that it is
>a program by Rose Anne Medeiros, who once wrote a program called -njc-,
>for which I am not responsible. Here is a matrix version of her
program,
>for which she is not responsible. 
>
>*! mmdesc 1.0 NJC 23 February 2009 
>* mdesc 1.0 RAM 18 JULY 2008
>* Returns a table with number missing, total, and missing/total
>program mmdesc, rclass byable(recall)
>version 8 
>syntax [varlist] [if] [in]
>tempvar touse
>mark `touse' `if' `in'
>local nvars : word count `varlist' 
>tempname matrix 
>matrix `matrix' = J(`nvars', 3, .) 
>local i = 1 
>quietly foreach var of local varlist {
>	count if missing(`var') & `touse' 
>	matrix `matrix'[`i', 1] = r(N) 
>	count if `touse'
>	matrix `matrix'[`i', 2] = r(N) 
>	matrix `matrix'[`i', 3] = `matrix'[`i',1] / `matrix'[`i',2] 
>	local ++i  
>}
>matrix rownames `matrix' = `varlist'                     
>matrix colnames `matrix' = Missing Total Missing/Total 
>matrix list `matrix', noheader 
>return matrix table = `matrix' 
>end
>
>Once you run this, you can save its results by e.g. 
>
>mat A = r(table) 
>
>... 
>
>mat B = r(table) 
>
>...
>
>mat C = A + B 
>
>
>Nick 
>n.j.cox@durham.ac.uk 
>
>Shehzad Ali
>
>Another (perhaps) simple question: is it possible to add different 
>frequency tables together to form a combined single table in Excel?
>
>I have two group variables (say, var1: type of treatment; and var2:
time
>
>point of observation) and 5 use variables. So an observation would be
>for 
>treatment A at time point 1. I want to know the number of missing
values
>
>for each use variable for each treatment group and each time point.
Also
>
>for each use variable there are 'if' conditions, so I can't do the 
>frequency count in one go. Here is what I am doing using a user written

>programme -mdesc- or -missing-:
>
>by treatment timpoint: mdesc use1 if (condition1 condition2)
>
>by treatment timpoint: mdesc use2 if (condition3 condition4)
>
>and so on. 
>
> 
>Now I want to combine all these counts for all timepoints and treatment

>groups to get a single table that I can export to Excel. Is there any
>way I 
>could do that?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index