Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Replicating a sas loop in stata results in very slow computation time....


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   "Statalist" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Replicating a sas loop in stata results in very slow computation time....
Date   Fri, 11 Apr 2008 21:59:34 +0900

Scott Merryman wrote (excerpted):

Also, the default technique in SAS is IRLS. Stata uses Newton-Raphson. At
least in a couple of examples, changing the technique to NR did increase the
time.

--------------------------------------------------------------------------------

In light of that, what I showed as mimicking the SAS code should read as
Scott shows, i.e., with the -irls- option.  It should then run as fast as
the corresponding SAS code.

Another suggestion to the original poster:  if your age variable is age in
years, then it's an integer.  (And it will still be incremental even if it's
centered and no longer an integer.)  If that's the case, and if you haven't
done so already, look into reducing your millions of rows to thousands of
rows:

1.  defer generating the indicator variables

2.  keep the pertinent variables, i.e., year yr00 age ihwt educ agedum

3.  -contract year age educ agedum [fweight = ihwt]-

4.  _freq is now your frequency weight; you can use -contract . . .,
freq(varname)-, too

Also, consider the following:

1.  If age wasn't centered, and if age2, age3 and age4 represent
quartic-polynomial variables, then center age before regenerating the
polynomial expansion.  Logistic regression using Newton-Raphson is more
sensitive to near collinearity in the predictors than is linear regression
in my experience.

2.  In lieu of polynomial regression (if that's what you're doing), look
into the alternatives available in Stata, e.g., -bspline-, -fracpoly-.  You
can avoid agecat, too, as a predictor.

3.  If you're doing the leave-one-out looping in order to jackknife
coefficient standard errors, then there is a better way to do that in Stata,
too.

4.  Use -adjust-, if you're not saving anything.  It won't save time (it
calls -predict-), but it will display odds by education and age-dummy
categories more conveniently.

Joseph Coveney

clear *
set more off
tempfile one
use if year != 100 using r2f3_glm
keep ihwt yr00 year educ age agedum
foreach var of varlist _all {
   drop if missing(`var')
}
replace yr00 = yr00 != 0 // If not already 0/1
compress
contract year age educ agedum [fweight = ihwt]
summarize age [fweight = _freq], meanonly
replace age = age - r(mean)
generate float age2 = age * age
generate float age3 = age2 * age
generate float age4 = age3 * age
xi i.educ*age i.educ*age2 i.educ*age3 i.educ*age4 i.agedum
save `one'
forvalues i = 94/102 {
   use if year != `i' using `one', clear
   logit yr00 _I* age* [fweight = _freq], nolog
   local means
   foreach var of varlist age* {
       summarize `var' [fweight = _freq], meanonly
       local means `means' `var' = r(mean),
   }
   adjust `means' by(educ agedum) exp format(%6.1f)
}
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index