Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: request for help - multi-level modelling with a big dataset usingxtlogit


From   "Nick Winter" <nwinter@policystudies.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: request for help - multi-level modelling with a big dataset usingxtlogit
Date   Fri, 19 Jul 2002 09:27:29 -0400

> "Alves, Bernadette" wrote:
> > 
> > I'm a student looking for help with my MSc dissertation 
> looking at factors
> > associated with delivery by caesarean section. It's an analysis of a
> > database of about half a million records of women who gave birth in
> > hospital.   I am using logistic regression and because my 
> data are naturally
> > grouped, I'm using a multi-level approach to take account 
> of the correlation
> > between women in the same hospital.  I am therefore using 
> xtlogit (rather
> > than logit).   I find that I cannot run xtlogit with my 
> entire 500,000
> > records - stata comes back with an error saying that it 
> needs to be able to
> > set matsize to approximately 18,000.  Unfortunately the 
> matsize limit for
> > stata 7.0 is 800.
> > 
> > I then took a 4% sample (approximately 20,000 records ) 
> which is the largest
> > that stata can cope with at a matsize of 800.  But, and 
> here's the weird
> > thing that I need help with.... The parameter estimates are 
> very dependent
> > on the sample I take. Sometimes I get a p-value of 0.05, 
> for other samples I
> > get a p-value of 0.7.  Here's an example of what I do to 
> test whether
> > xdelmid is a predictor of emergency caesarean section.
> > 
> >         sample 4  /* this give me the 4% sample */
> > 
> >         xi: xtlogit emerg i.gestat i.age i.xdelmid, pa 
> corr(exch) robust
> > i(provid)
> > 
> >         testparm _Ixdel*  /* this does a wald test on xdelmid */
> > 
> > Taking 10 different 4% sample, I find my estimates differ 
> considerably and
> > my p-values range from 0.04 to 0.71.
> > 
> > Why can't stata cope with the full dataset and why are the parameter
> > estimates so sensitive to the sample taken?
> > 
> > I would be extremely grateful if someone could help me with this.

I know little about xtlogit and its memory requirements, so I can't
speak to that.  But even with Stata SE, you would need a *huge* amount
of memory in your computer to run anything with a matsize of 11,000.

I would take your subsamples of complete locations -- that might be
causing the vast variation in significance across your samples.

-Nick Winter
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index