Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Question about large data set analyses and survey logistic regressions

From   jaswartz <>
Subject   st: Question about large data set analyses and survey logistic regressions
Date   Sun, 4 Jan 2004 14:42:57 -0600


I am new to the list and relatively new to Stata. I hope my questions have not 
been addressed too much before but, if so, I would appreciate being pointed in 
the direction of the archives. I have done some snooping on my own but have 
not found anything that addresses my specific issues. Here they are:

I am analyzing data collected as part of the National Household Survey on Drug 
Abuse. The data set contains about 39,000 observations. I am using weights 
provided by RTI, which conducts the survey, to obtain accurate point and 
standard error estimates. Hence, I am using the svylogistic procedure in Stata 
to incorporate these weights. Before running my logistic regressions, I ran 
some bivariate stats using svytab. Because of the large sample size, 
everything comes out statistically significant looking at the Chi-Square 
results for the bivariate statistics. My first question is: Is there a way to 
run bivariate statistics on a large data set like this to find non-trivial, 
significant relationships. For instance, could I draw a random sample of size 
N from my complete sample, to obtain statistical power of about .80 and run 
the bivariates on that random sample? Or, is there a more preferred way of 
analyzing these data?

My second question is with the svylogistic regression procedures. The output 
does not give you a pseudo-R square as the regular logistic regression 
procedure does. Is there a way to get an effect size estimate using the 
svylogistic procedure or should I just run the logistic procedure and use that 
estimate?  Finally, all of my model fit statistics seem to come up significant 
with this data set meaning my models are not fitting the data well. Is this 
also attributable to the large data set size? Is there a preferred way of 
conducting model diagnostics when the data set is this large?

You can respond directly to my email at

Thanks very much for any help.


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index