Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Pooling cross-sectional data?


From   "Polis, Chelsea B." <cpolis@jhsph.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Pooling cross-sectional data?
Date   Thu, 16 Oct 2008 14:22:29 -0400

Dear Statalisters,

I have 9 rounds of cross-sectional information on (essentially) a census of individuals within a given set of communities.  I restricted each round to include only HIV+ women who were between 15-49 during that year.  To determine predictors of hormonal contraceptive use among these women, I considered all variables which could be hypothesized to predict HC use and which were (borderline) significant in univariate analyses.  Such variables were considered for inclusion in multivariate models, and final models were selected based on results of likelihood ratio tests, removing variables which had insignificant p-values (testparm assessed overall significance of categorical variables).  When there were questions about particular variables, I also looked at automatic selection procedures (forward, backward, and forward stepwise) and compared models using the fitstat command.  I conducted GOF tests and looked at bootstrapped logistic regression to obtain robust CIs and SEs.

1st question: Do my methods seem reasonable as a way to let the data "speak for itself" when it comes to determining predictors of HC use, since predictors may have changed with time and since I had a wide variety of variables which could be hypothesized to be predictors?

Although it would be interesting to develop a longitudinal model to explore how factors predictive of HC have changed over time, a handful of variables of interest were not collected at all nine rounds, so I think this is probably not possible.  My advisor suggested collapsing the nine rounds into three time periods (i.e. Time A includes rounds 1, 2, 3; Time B includes rounds 4, 5, 6; and Time C includes rounds 7, 8, 9) for ease of presentation (i.e., showing nine separate regressions wouldn't work too well in a published paper).  Only variables collected during all three rounds of a given time period will be included in the regression for that time period, meaning I will be able to incorporate more of this inconsistently collected variables into the smaller regressions.

2nd question: I assume that I need to control for the correlation between women who participated in several rounds within a time period by using something like GEE.  Am I on the right path in terms of thinking about how to do this?  If anybody could point me in the right direction in terms of resources to help me think through an analysis like this (which I have never seen an example of), particularly problems I might not be considering in terms of pooling the rounds, and also regarding the best way to then determine the predictors of HC use using that pooled data while controlling for correlation between women, I would be most appreciative.

Many thanks, and sorry for the lengthy post.

Chelsea

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index