Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Processing speed for ttest


From   "Wallace, John" <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: Processing speed for ttest
Date   Wed, 8 Oct 2003 17:48:27 -0700

Dear Statalisters

I've just finished writing my first .do file for a truly enormous data
processing task.  Its now running, and I'm underwhelmed at the pace its
going at.  I'll describe the dataset, the task, and the .do file;  please
comment on my approach and whether there is a more efficient way to run it.

I have a set of ~1100000 records, consisting of 3 supergroups of 6
replicates.  Each replicate has ~61000 analytes.  Each analyte is tested
across a pair of supergroups in an unpaired t-test, with 6 replicates.

Incidentally, if I'd had my way, we'd be using a oneway anova with a
bonferroni correction for significance, but the person requesting the
analysis wanted t-tests. I'm not sure that this would improve the speed of
the processing though (I imagine I'll find out later, since I'll eventually
get my way with the analysis approach)

I'm using the following variables
analyte = member of ~61000 records (string)
numanalyte = -encode-d analyte
q = counter for the set of supergroups in the t-test
I = counter for the t-test within the set of supergroups
p`q' = title of variable in dataset for recording the calculated p-value of
the test
numsgroup = -encode-d supergroup (1, 2, or 3)
det = float number being tested

.do-file:

set more off

encode(analyte), gen(numanalyte)
sum numanalyte
local min = r(min)
local max = r(max) 

forvalues q = 1(1)3 {
display "ttest "`q'
	g p`q' = .
	   forvalues i = `min'(1)`max' {
	   	display `i'
	      
		if `q' == 1 {
			quietly ttest det if numanalyte == `i' & numsgroup
!=3, by(numsgroup)  unpaired
			}
		else if `q' == 2 {
			quietly ttest det if numanalyte == `i' & numsgroup
!=2, by(numsgroup)  unpaired
			}
		else {
			quietly ttest det if numanalyte == `i' & numsgroup
!=1, by(numsgroup)  unpaired
			}

		capture replace p`q' = r(p) if numanalyte == `i'
	 }
}
set more on
exit
end

I'm monitoring the progress of the analysis by -display-ing `q' and `i'.
I'm getting a new `i' displayed about once every 3.6 seconds.  This leads me
to think the entire analysis is going to take a few days!  I've got a Dell
Xeon workstation with dual 1.4GHz processors and 0.5GB memory, and more than
sufficient hard drive space.  I've allocated 200M to Stata, and I'm running
Stata8, fully updated(9/30).

Incidentally, I pre-sorted the dataset by analyte and supergroup in the hope
that "making them close together" would speed processing.

60 mins in, 600 tests done...it seems to be slowing down (uhoh)

John Wallace
Research Associate
Affymetrix, Inc
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index