Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: request for help - multi-level modelling with a big dataset using xtlogit

From   "FEIVESON, ALAN H. (AL) (JSC-SD) (NASA)" <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: request for help - multi-level modelling with a big dataset using xtlogit
Date   Fri, 19 Jul 2002 08:10:30 -0500


Without knowing how you chose your subsamples, it's hard to say why you
obtained such huge variation in your p-values. If your data is arranged in
large clusters of all caesarean or no caesarean cases, or in large clusters
of similarly-valued predictor variables, your subsamples may not be
representative of the whole dataset.

Instead of xtlogit, try xtgee with a binomial distribution and logit link. I
believe it will work on all your data at once. xtlogit is a
maximum-likelihood approach assuming a random-effects model and is
computationally intense. xtgeee uses the method of generalized estimating
equations (GEE) with a robust estimator of variance allowing for the
clusters. It only asumes a certain correlation structure for observations
within clusters - the default is an equicorrelated structure.

Hope this helps.

Al Feiveson

-----Original Message-----
From: Alves, Bernadette [mailto:[email protected]]
Sent: Friday, July 19, 2002 7:41 AM
To: '[email protected]'
Subject: st: request for help - multi-level modelling with a big dataset
using xtlogit

I'm a student looking for help with my MSc dissertation looking at factors
associated with delivery by caesarean section. It's an analysis of a
database of about half a million records of women who gave birth in
hospital.   I am using logistic regression and because my data are naturally
grouped, I'm using a multi-level approach to take account of the correlation
between women in the same hospital.  I am therefore using xtlogit (rather
than logit).   I find that I cannot run xtlogit with my entire 500,000
records - stata comes back with an error saying that it needs to be able to
set matsize to approximately 18,000.  Unfortunately the matsize limit for
stata 7.0 is 800.  

I then took a 4% sample (approximately 20,000 records ) which is the largest
that stata can cope with at a matsize of 800.  But, and here's the weird
thing that I need help with.... The parameter estimates are very dependent
on the sample I take. Sometimes I get a p-value of 0.05, for other samples I
get a p-value of 0.7.  Here's an example of what I do to test whether
xdelmid is a predictor of emergency caesarean section.

        sample 4  /* this give me the 4% sample */

        xi: xtlogit emerg i.gestat i.age i.xdelmid, pa corr(exch) robust

        testparm _Ixdel*  /* this does a wald test on xdelmid */

Taking 10 different 4% sample, I find my estimates differ considerably and
my p-values range from 0.04 to 0.71.

Why can't stata cope with the full dataset and why are the parameter
estimates so sensitive to the sample taken?

I would be extremely grateful if someone could help me with this.


*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index