Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Question about large data set analyses and survey logistic regressions


From   bill magee <[email protected]>
To   [email protected]
Subject   st: Question about large data set analyses and survey logistic regressions
Date   Mon, 5 Jan 2004 06:08:02 -0500

You may be interested in looking at the following, or other stuff on Baysian Model selection:


Raftery, Adrian E. 1995. �Baysian Model Selection in Social Research.� in Sociological
Methodology v.25
, edited by P. V. Marsden. Washington D.C.: American Sociological
Association.



bill

On Monday, January 5, 2004, at 02:33 AM, statalist-digest wrote:

Question about large data set analyses and survey logistic regressions

Date: Sun, 4 Jan 2004 14:42:57 -0600
From: jaswartz <[email protected]>
Subject: st: Question about large data set analyses and survey logistic regressions

All,

I am new to the list and relatively new to Stata. I hope my questions have not
been addressed too much before but, if so, I would appreciate being pointed in
the direction of the archives. I have done some snooping on my own but have
not found anything that addresses my specific issues. Here they are:

I am analyzing data collected as part of the National Household Survey on Drug
Abuse. The data set contains about 39,000 observations. I am using weights
provided by RTI, which conducts the survey, to obtain accurate point and
standard error estimates. Hence, I am using the svylogistic procedure in Stata
to incorporate these weights. Before running my logistic regressions, I ran
some bivariate stats using svytab. Because of the large sample size,
everything comes out statistically significant looking at the Chi-Square
results for the bivariate statistics. My first question is: Is there a way to
run bivariate statistics on a large data set like this to find non-trivial,
significant relationships. For instance, could I draw a random sample of size
N from my complete sample, to obtain statistical power of about .80 and run
the bivariates on that random sample? Or, is there a more preferred way of
analyzing these data?

My second question is with the svylogistic regression procedures. The output
does not give you a pseudo-R square as the regular logistic regression
procedure does. Is there a way to get an effect size estimate using the
svylogistic procedure or should I just run the logistic procedure and use that
estimate? Finally, all of my model fit statistics seem to come up significant
with this data set meaning my models are not fitting the data well. Is this
also attributable to the large data set size? Is there a preferred way of
conducting model diagnostics when the data set is this large?

You can respond directly to my email at [email protected].

Thanks very much for any help.

James




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index