[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Joseph Coveney" <jcoveney@bigplanet.com> |

To |
"Statalist" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Replicating a sas loop in stata results in very slow computation time.... |

Date |
Fri, 11 Apr 2008 21:59:34 +0900 |

Scott Merryman wrote (excerpted): Also, the default technique in SAS is IRLS. Stata uses Newton-Raphson. At least in a couple of examples, changing the technique to NR did increase the time. -------------------------------------------------------------------------------- In light of that, what I showed as mimicking the SAS code should read as Scott shows, i.e., with the -irls- option. It should then run as fast as the corresponding SAS code. Another suggestion to the original poster: if your age variable is age in years, then it's an integer. (And it will still be incremental even if it's centered and no longer an integer.) If that's the case, and if you haven't done so already, look into reducing your millions of rows to thousands of rows: 1. defer generating the indicator variables 2. keep the pertinent variables, i.e., year yr00 age ihwt educ agedum 3. -contract year age educ agedum [fweight = ihwt]- 4. _freq is now your frequency weight; you can use -contract . . ., freq(varname)-, too Also, consider the following: 1. If age wasn't centered, and if age2, age3 and age4 represent quartic-polynomial variables, then center age before regenerating the polynomial expansion. Logistic regression using Newton-Raphson is more sensitive to near collinearity in the predictors than is linear regression in my experience. 2. In lieu of polynomial regression (if that's what you're doing), look into the alternatives available in Stata, e.g., -bspline-, -fracpoly-. You can avoid agecat, too, as a predictor. 3. If you're doing the leave-one-out looping in order to jackknife coefficient standard errors, then there is a better way to do that in Stata, too. 4. Use -adjust-, if you're not saving anything. It won't save time (it calls -predict-), but it will display odds by education and age-dummy categories more conveniently. Joseph Coveney clear * set more off tempfile one use if year != 100 using r2f3_glm keep ihwt yr00 year educ age agedum foreach var of varlist _all { drop if missing(`var') } replace yr00 = yr00 != 0 // If not already 0/1 compress contract year age educ agedum [fweight = ihwt] summarize age [fweight = _freq], meanonly replace age = age - r(mean) generate float age2 = age * age generate float age3 = age2 * age generate float age4 = age3 * age xi i.educ*age i.educ*age2 i.educ*age3 i.educ*age4 i.agedum save `one' forvalues i = 94/102 { use if year != `i' using `one', clear logit yr00 _I* age* [fweight = _freq], nolog local means foreach var of varlist age* { summarize `var' [fweight = _freq], meanonly local means `means' `var' = r(mean), } adjust `means' by(educ agedum) exp format(%6.1f) } exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: programming prob** - Next by Date:
**st: RE : Spatial Statistics** - Previous by thread:
**Re: st: Replicating a sas loop in stata results in very slow computation time....** - Next by thread:
**st: ci question** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |