Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Overriding a loop if 0 observations using tabstat


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Overriding a loop if 0 observations using tabstat
Date   Tue, 27 Apr 2010 20:12:10 +0200

<>

t=100.28; t=207.58; t=241.55. :-)


HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Dienstag, 27. April 2010 19:08
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: Overriding a loop if 0 observations using tabstat

Good question. I decided to do some timings to support -- or rebut -- my
feeling that -count- which just counts should be faster than -summarize,
meanonly- which does other stuff too and in turn than -summarize- which does
other stuff too. But although that's the order the timings are closer than I
guessed. Still, doing anything the quickest way does no harm and may give
valuable speed-up for large problems. 

Here is one test script. Compare your experiences: 

clear
set obs 100000
set seed 2803
gen y = runiform()
set rmsg on

qui forval i = 1/10000 {
	count if y > 0.5
}

qui forval i = 1/10000 {
	su y if y > 0.5, meanonly
}

qui forval i = 1/10000 {
	su y if y > 0.5
}

My timings were t=187.49; 254.49; 313.38, which no doubt shows up the
Mesolithic age of my machine. 

Nick 
n.j.cox@durham.ac.uk 

Martin Weiss

" As a small detail of efficiency, I would always recommend -count- rather
than -summarize- for the purpose here."

My earlier code did use -count-... What makes this thing more efficient,
though? Both are built-in, so they probably enjoy a big advantage over
everybody else anyway. So I guess the reason for your preference is the fact
that -count- calculates fewer results than -su, mean-?

Nick Cox

A secondary theme here is that this kind of code gets very difficult to
read, which makes it difficult to maintain and debug. 

I note that the condition 

intab1 == 1 & admit_ic == 1 & btwg < . 

is common to all the -summarize- and -tabstat- commands. That being so, you
could get that out of the way like this 

preserve 
keep if intab1 == 1 & admit_ic == 1 & btwg < .
<stuff> 
restore 

Your -tabstat- options that are constant can be put in a little bag: 

local opts stat(n mean median p25 p75 min max) col(stat) f(%9.0g) notot
nosep

Now <stuff> can be rewritten 

forv i = 0/5 {
	foreach y in male singlet {
		forv s = 0/1 {
			di "myga==`i' & `y'==`s'"
			qui su bwtg if myga==`i' & `y'
			if r(N) != 0 {	
				tabstat bwtg if myga==`i', `opts' by(`y') 
			}
		}
	}
}

Now it is easier to see what is going on. I added some cosmetic changes too,
which this horrible mailer may well reverse. 

One puzzle: Did you mean to add the condition "& `y'" to the -summarize-? It
means the same as 

& `y' != 0 

-- which may or may not be what you want. 

As a small detail of efficiency, I would always recommend -count- rather
than -summarize- for the purpose here. 

Nick 
n.j.cox@durham.ac.uk 

sara khan

Many thanks Maarten for your advice. I managed to resolve it with the
following code:

forv i=0/5 {
foreach y in male singlet{
forv s=0/1{
di "myga==`i' & `y'==`s'"
qui su bwtg if myga==`i' & intab1==1 & admit_ic==1 & bwtg<. & `y'
	if r(N)!=0{	
tabstat bwtg if myga==`i' & intab1==1 & admit_ic==1 & bwtg<., stat(n
mean median p25 p75 min max ) by(`y') col(stat) f(%9.0g) notot nosep

}
}
}
}


On Tue, Apr 27, 2010 at 12:56 PM, Maarten buis <maartenbuis@yahoo.co.uk>
wrote:
>
> --- On Tue, 27/4/10, sara khan wrote:
>> I just tried this but the output only shows the display
>> results and nothing from tabstat.
> <snip>
>
> -capture- works for me:
>
> *----------------- begin example ---------------------
> sysuse auto, clear
> forvalues i = 0/5 {
>        capture noisily tabstat mpg if rep78== `i', ///
>                s(n mean) by(foreign)
> }
> *-------------------- end example -------------------
>
> In order to debug your loop I would build it step by step:
> step 1: no looping, no locals, no -if- just a single -tatstat- command
> step 2: add -capture noisily-
> step 3: add some -if- conditions
> step 4: build a single loop (e.g. over i but not over y)
> etc. etc.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index