Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jeph Herrin <stata@spandrel.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: tricks to speed up -xtmelogit- |

Date |
Wed, 22 Dec 2010 08:33:15 -0500 |

There's not really a way to reduce the variables; I took several thousands of medical diagnosis and procedure codes, classified them into 500 related groups (according to a published scheme) and then reduced the 500 to 100 by running 500 linear probability models and keeping those with the biggest abs(t-value). (100 was arbitrary but turns out that only 1 procedure group and 1 diagnostic group had P > 0.05 in the final model.) The other variables are categories of age, sex, length of stay, number of admissions, etc. Anyway, if I drop any of the x's, I would hvae to re-estimate the "main" model, which means 3 weeks wasted :) In the end, I have decided to look at only one additional model and to give it 3 weeks. cheers, Jeph On 12/21/2010 3:28 PM, Sergiy Radyakin wrote:

Hi, Jeph, very interesting problem. Are the 150 variables related? E.g. are these 150 a single group of dummies? Or are they all independent: height/age/gender? With 6mln observations there is some chance you will have some duplicates, which may give you a possibility to reduce your sample a bit (just adjust the weights). Given the rareness of your outcome taking a simple subsample may yield just a few positives in the subsample. May I suggest also to consider taking all positives and a random subsample of negatives, estimate the candidate and then run the full sample on that? Finally, this command is not in the MP report, but have you investigated how does it perform as N(CPU) grows? Best regards, Sergiy On Tue, Dec 21, 2010 at 2:15 PM, Jeph Herrin<stata@spandrel.net> wrote:All, I am trying to estimate a series of models using 6 million observations; the observations are nested within 3000 groups, and the dichotomous outcome is somewhat rare, occurring in about 0.5% of observations. There are about 150 independent variables, and so my basic model looks like this: . xtmelogit Y x1-x150 || group: This took approximately 3 weeks to converge on a high end machine (3.2GHz, Intel Core i7, 24GB RAM). I saved the estimation result . est save main but now would like to estimate some related models of the form . xtmelogit Y x1-x150 z1 z2 || group: and would like to think I can shave some considerable time off the estimation using the prior information available. I tried . est use main . matrix b = e(b) . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0)) but this gave me an error that the likelihood was flat and nothing proceed. So I've thought of some other approaches, but am not sure what I expect to be most efficient, and would prefer not to spend weeks figuring it out. One idea was to use a sample, estimate the big model, and then use that as a starting point: . est use main . matrix b = e(b) . gen byte sample = (uniform()*1000)<1 . xtmelogit Y x1-x150 z1 z2 if sample || group:, from(b) . matrix b = e(b) . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0)) Another was to first use Laplace iteration, and start with that result: . est use main . matrix b = e(b) . xtmelogit Y x1-x150 z1 z2 if sample || group:, from(b) laplace . matrix b = e(b) . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0)) I'd appreciate any insight into which of these approaches might shave a meaningful amount of time off of getting the final estimates, or if there is another that I could try. thanks, Jeph * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: tricks to speed up -xtmelogit-***From:*Jeph Herrin <stata@spandrel.net>

**Re: st: tricks to speed up -xtmelogit-***From:*Sergiy Radyakin <serjradyakin@gmail.com>

- Prev by Date:
**Fw: Re: Re: st: to use 'fe' and 'cl' options** - Next by Date:
**st: Preserve and restore within a foreach loop** - Previous by thread:
**Re: st: tricks to speed up -xtmelogit-** - Next by thread:
**st: December 27** - Index(es):