[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

# st: Question about large data set analyses and survey logistic regressions

 From jaswartz To statalist@hsphsun2.harvard.edu Subject st: Question about large data set analyses and survey logistic regressions Date Sun, 4 Jan 2004 14:42:57 -0600

```All,

I am new to the list and relatively new to Stata. I hope my questions have not
been addressed too much before but, if so, I would appreciate being pointed in
the direction of the archives. I have done some snooping on my own but have
not found anything that addresses my specific issues. Here they are:

I am analyzing data collected as part of the National Household Survey on Drug
Abuse. The data set contains about 39,000 observations. I am using weights
provided by RTI, which conducts the survey, to obtain accurate point and
standard error estimates. Hence, I am using the svylogistic procedure in Stata
to incorporate these weights. Before running my logistic regressions, I ran
some bivariate stats using svytab. Because of the large sample size,
everything comes out statistically significant looking at the Chi-Square
results for the bivariate statistics. My first question is: Is there a way to
run bivariate statistics on a large data set like this to find non-trivial,
significant relationships. For instance, could I draw a random sample of size
N from my complete sample, to obtain statistical power of about .80 and run
the bivariates on that random sample? Or, is there a more preferred way of
analyzing these data?

My second question is with the svylogistic regression procedures. The output
does not give you a pseudo-R square as the regular logistic regression
procedure does. Is there a way to get an effect size estimate using the
svylogistic procedure or should I just run the logistic procedure and use that
estimate?  Finally, all of my model fit statistics seem to come up significant
with this data set meaning my models are not fitting the data well. Is this
also attributable to the large data set size? Is there a preferred way of
conducting model diagnostics when the data set is this large?

You can respond directly to my email at jaswartz@uic.edu.

Thanks very much for any help.

James

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

 © Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index