Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Is Stata 9 really faster than stata 7?


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Is Stata 9 really faster than stata 7?
Date   Wed, 01 Nov 2006 17:53:23 +0000

The short answer is "Yes" and "No" and "tell us more".

Some of these issues have surfaced on the list before, shortly
after the releases of Stata 8 and Stata 9 respectively.

Specifically, -list- was rewritten for Stata 8. One side-effect
is that it can be slower for moderate datasets, because it
cannot start showing anything much until it has thought about
all the data to be -list-ed. Check out the -fast- option that
is documented.

Specifically, graphics are slower in Stata 8/9, for good
reasons, but you don't mention that and it's another story.

Otherwise, the idea of a test script is naturally the way
to assess this, but

1. Your test script is not portable to other machines
without a copy of the dataset are, so no one else
can compare.

2. Your computer may remain the same, but precisely
what that means for the different programs is not
easily discussable without more information on your computer.

StataCorp will probably want to give a definitive answer.

Nick
n.j.cox@durham.ac.uk

"Shourun Guo" <guosa@bc.edu>

I recently upgraded from STATA 7 to STATA 9. WHen I reran some old programs
and played around with STATA 9, I found sometimes the STATA 9 is much slower than the STATA 7.

1. I have a dataset with about 700,000 obs. and 4 variables. When I
type -list-, it takes STATA 9 several seconds to start to list, while in
STATA 7 it begins to list instantly.

2. When I ran the following ado file on the above dataset in STATA 9 and
STATA 7, STATA 9 is always much slower. The dataset has about 700,000 obs.
There is a categary variable called 'group', which is continuous from 1 to
6250. Whith which group, there are 80-127 observations. (Different groups
may have different number of observations). For each group, I need to run a
regression and record the estimation coefficients. I use a loop to do the
job. In the loop, I avoided to use -if group=`i'- because it seems -if- cost
more time than -in- to identify the desired observations from my experience
in STATA 7 when dealing with large dataset. Basically, I first determine the
beginning obs and ending obs for each group and then run the regression in
the loop using -in- condition.

I did some experiments. If I keep 1000 groups, STATA 7 used 17 seconds to
finish while STATA 9 used 54 seconds. With 3000 groups, STATA 7 used 144
seconds while STATA 9 used 471 seconds. With all 6250 groups, STATA 7 used
about 18 minutes, while STATA 9 used about 110 minutes. All the experiments
are done on the same computer and without other program running. The results
don't make sense to me. The speed shouldn't be so slow for Verison 9. It
seems that I need to optimize my program for STATA 9. Any thoughts or
suggestions?


set more off
set mem 100m
use ./temp3, clear
sort group
by group: gen obsnum=_N
by group: keep if _n==1
keep group obsnum
sum group
local max=r(max)

forval i=1/`max' {
local n`i'=obsnum[`i']
}

use ./temp3, clear
sort group
tempname result1
postfile `result1' id alpha beta using .\rep_beta_anndate, replace
local base=0

forval i=1/`max' {
local first=`base'+1
local last=`base'+`n`i''
quietly regress ret vwretd in `first'/`last'
post `result1' (`i') (_b[_cons]) (_b[vwretd])
local base=`base'+`n`i''
}
postclose `result1'

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index