Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tricks to speed up -xtmelogit-


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: tricks to speed up -xtmelogit-
Date   Tue, 21 Dec 2010 15:28:23 -0500

Hi, Jeph,

very interesting problem. Are the 150 variables related? E.g. are these 150 a
single group of dummies? Or are they all independent: height/age/gender?
With 6mln observations there is some chance you will have some duplicates,
which may give you a possibility to reduce your sample a bit (just adjust the
weights).

Given the rareness of your outcome taking a simple subsample may yield just
a few positives in the subsample. May I suggest also to consider taking all
positives and a random subsample of negatives, estimate the candidate and
then run the full sample on that?

Finally, this command is not in the MP report, but have you investigated how
does it perform as N(CPU) grows?

Best regards, Sergiy


On Tue, Dec 21, 2010 at 2:15 PM, Jeph Herrin <stata@spandrel.net> wrote:
> All,
>
> I am trying to estimate a series of models using 6 million observations;
> the observations are nested within 3000 groups, and the dichotomous
> outcome is somewhat rare, occurring in about 0.5% of observations.
> There are about 150 independent variables, and so my basic model looks
> like this:
>
>  . xtmelogit Y x1-x150 || group:
>
> This took approximately 3 weeks to converge on a high end machine
> (3.2GHz, Intel Core i7, 24GB RAM). I saved the estimation result
>
>  . est save main
>
> but now would like to estimate some related models of the form
>
>  . xtmelogit Y x1-x150 z1 z2 || group:
>
> and would like to think I can shave some considerable time off the
> estimation using the prior information available. I tried
>
>  . est use main
>  . matrix b = e(b)
>  . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0))
>
> but this gave me an error that the likelihood was flat and nothing
> proceed. So I've thought of some other approaches, but am not sure what
> I expect to be most efficient, and would prefer not to spend weeks
> figuring it out.
>
> One idea was to use a sample, estimate the big model, and then use
> that as a starting point:
>
>  . est use main
>  . matrix b = e(b)
>  . gen byte sample = (uniform()*1000)<1
>  . xtmelogit Y x1-x150 z1 z2 if sample || group:, from(b)
>  . matrix b = e(b)
>  . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0))
>
> Another was to first use Laplace iteration, and start with that result:
>
>  . est use main
>  . matrix b = e(b)
>  . xtmelogit Y x1-x150 z1 z2 if sample || group:, from(b) laplace
>  . matrix b = e(b)
>  . xtmelogit Y x1-x150 z1 z2 || group:, from(b) refineopts(iterate(0))
>
> I'd appreciate any insight into which of these approaches might shave
> a meaningful amount of time off of getting the final estimates, or if
> there is another that I could try.
>
> thanks,
> Jeph
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index