[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Is Stata 9 really faster than stata 7?

From	[email protected] (Jeff Pitblado, StataCorp LP)
To	[email protected]
Subject	Re: st: Is Stata 9 really faster than stata 7?
Date	Thu, 02 Nov 2006 13:04:01 -0600

Shourun Guo <[email protected]> noticed some speed differences between Stata 7 and
Stata 9.  Nick Cox <[email protected]> replied some of the issues that Guo
mentioned.  I'll address the substantial differences related to Guo's do-file
example:

> When I ran the following ado file on the above dataset in STATA 9 and
> STATA 7, STATA 9 is always much slower. The dataset has about 700,000 obs.
> There is a categary variable called 'group', which is continuous from 1 to
> 6250. Whith which group, there are 80-127 observations. (Different groups
> may have different number of observations). For each group, I need to run a
> regression and record the estimation coefficients. I use a loop to do the
> job. In the loop, I avoided to use -if group=`i'- because it seems -if- cost
> more time than -in- to identify the desired observations from my experience
> in STATA 7 when dealing with large dataset. Basically, I first determine the
> beginning obs and ending obs for each group and then run the regression in
> the loop using -in- condition.
> 
> I did some experiments. If I keep 1000 groups, STATA 7 used 17 seconds to
> finish while STATA 9 used 54 seconds. With 3000 groups, STATA 7 used 144
> seconds while STATA 9 used 471 seconds. With all 6250 groups, STATA 7 used
> about 18 minutes, while STATA 9 used about 110 minutes.  All the experiments
> are done on the same computer and without other program running. The results
> don't make sense to me. The speed shouldn't be so slow for Verison 9. It
> seems that I need to optimize my program for STATA 9. Any thoughts or
> suggestions?
> 
> 
> set more off
> set mem 100m
> use ./temp3, clear
> sort group
> by group: gen obsnum=_N
> by group: keep if _n==1
> keep group obsnum
> sum group
> local max=r(max)
> 
> forval i=1/`max' {
> 	local n`i'=obsnum[`i']
> 	}
> 
> use ./temp3, clear
> sort group
> tempname result1
> postfile `result1' id alpha beta using .\rep_beta_anndate, replace
> local base=0
> 
> forval i=1/`max' {
> 	local first=`base'+1
> 	local last=`base'+`n`i''
> 	quietly regress ret vwretd in `first'/`last'
> 	post `result1' (`i') (_b[_cons]) (_b[vwretd])
> 	local base=`base'+`n`i''
> 	}
> postclose `result1'

We looked into why, in this case, -regress- is so much slower in Stata 9
compared to earlier Stata releases.

The short answer:

It turns out that there are two unnecessary sortpreserves performed for each
call to -regress- in Stata 9.  We will fix this in the next ado-file update,
but in the mean time Guo can use the undocumented -_regress- command (which is
the renamed version of the originally internal -regress- command).

The long answer:

In Stata 9, the -vce(bootstrap)- and -vce(jackknife)- options were added to a
large number of Stata's estimation commands.  To facilitate this for
-regress-, the internal -regress- command was renamed to -_regress- so that an
ado-file could handle the -vce()- option (among other new features) and
call through to -_regress-.  The -sortpreserve- option was used in the program
definition for -regress- in regress.ado, but it is unnecessary since -sort- is
never directly called by -regress-.  There was a second unnecessary
-sortpreserve- that occurs when -regress- calls the "undocumented" routine
that parses the -vce(bootstrap)- and -vce(jackknife)- options.

Using a simulated dataset similar to the one Guo describes above, we have
determined that the fixed -regress- command in Stata 9 will be nearly as fast
as in the previous Stata releases (there is a minuscule amount of overhead due
to -regress- being an ado-file).

-----------------------------------------------------------------------------

Note that Guo's code is equivalent to the following in Stata 9
(-statsby- uses the -in- restriction too)

	. use temp3, clear
	. statsby alpha=_b[_cons] beta=_b[vwretd],
		by(group) save(rep_beta_anndate, replace) :
		regress ret vwretd

(the call to -statsby- is a single line, but was broken up for aesthetics)

For faster results (while waiting for the next ado-file update) Guo can use
the following in Stata 9 

	. statsby alpha=_b[_cons] beta=_b[vwretd],
		by(group) save(rep_beta_anndate, replace) :
		_regress ret vwretd

--Jeff					--Vince
[email protected]			[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: Memory during a merge
  - From: Fred Wolfe <[email protected]>

Prev by Date: Re: st: bootstrap rare events
Next by Date: st: Question about fndmtch2
Previous by thread: RE: st: Is Stata 9 really faster than stata 7?
Next by thread: st: Memory during a merge
Index(es):
- Date
- Thread