Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tricks to speed up -xtmelogit-

From	Jeph Herrin <[email protected]>
To	[email protected]
Subject	Re: st: tricks to speed up -xtmelogit-
Date	Wed, 22 Dec 2010 08:33:15 -0500

There's not really a way to reduce the variables; I took several
thousands of medical diagnosis and procedure codes, classified them
into 500 related groups (according to a published scheme) and
then reduced the 500 to 100 by running 500 linear probability
models and keeping those with the biggest abs(t-value). (100
was arbitrary but turns out that only 1 procedure group and 1
diagnostic group had P > 0.05 in the final model.) The other
variables are categories of age, sex, length of stay, number
of admissions, etc. Anyway, if I drop any of the x's, I would hvae
to re-estimate the "main" model, which means 3 weeks wasted :)

In the end, I have decided to look at only one additional model
and to give it 3 weeks.

cheers,
Jeph


On 12/21/2010 3:28 PM, Sergiy Radyakin wrote:

Hi, Jeph,

very interesting problem. Are the 150 variables related? E.g. are these 150 a
single group of dummies? Or are they all independent: height/age/gender?
With 6mln observations there is some chance you will have some duplicates,
which may give you a possibility to reduce your sample a bit (just adjust the
weights).

Given the rareness of your outcome taking a simple subsample may yield just
a few positives in the subsample. May I suggest also to consider taking all
positives and a random subsample of negatives, estimate the candidate and
then run the full sample on that?

Finally, this command is not in the MP report, but have you investigated how
does it perform as N(CPU) grows?

Best regards, Sergiy


On Tue, Dec 21, 2010 at 2:15 PM, Jeph Herrin<[email protected]>  wrote:

All,

I am trying to estimate a series of models using 6 million observations;
the observations are nested within 3000 groups, and the dichotomous
outcome is somewhat rare, occurring in about 0.5% of observations.
There are about 150 independent variables, and so my basic model looks
like this:

  . xtmelogit Y x1-x150 || group:

This took approximately 3 weeks to converge on a high end machine
(3.2GHz, Intel Core i7, 24GB RAM). I saved the estimation result

  . est save main

but now would like to estimate some related models of the form

  . xtmelogit Y x1-x150 z1 z2 || group:

and would like to think I can shave some considerable time off the
estimation using the prior information available. I tried

  . est use main
  . matrix b = e(b)
  . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0))

but this gave me an error that the likelihood was flat and nothing
proceed. So I've thought of some other approaches, but am not sure what
I expect to be most efficient, and would prefer not to spend weeks
figuring it out.

One idea was to use a sample, estimate the big model, and then use
that as a starting point:

  . est use main
  . matrix b = e(b)
  . gen byte sample = (uniform()*1000)<1
  . xtmelogit Y x1-x150 z1 z2 if sample || group:, from(b)
  . matrix b = e(b)
  . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0))

Another was to first use Laplace iteration, and start with that result:

  . est use main
  . matrix b = e(b)
  . xtmelogit Y x1-x150 z1 z2 if sample || group:, from(b) laplace
  . matrix b = e(b)
  . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0))

I'd appreciate any insight into which of these approaches might shave
a meaningful amount of time off of getting the final estimates, or if
there is another that I could try.

thanks,
Jeph



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: tricks to speed up -xtmelogit-
  - From: Jeph Herrin <[email protected]>
- Re: st: tricks to speed up -xtmelogit-
  - From: Sergiy Radyakin <[email protected]>

Prev by Date: Fw: Re: Re: st: to use 'fe' and 'cl' options
Next by Date: st: Preserve and restore within a foreach loop
Previous by thread: Re: st: tricks to speed up -xtmelogit-
Next by thread: st: December 27
Index(es):
- Date
- Thread