Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Best Logistic Regression Model


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Best Logistic Regression Model
Date   Wed, 19 Mar 2014 17:08:53 -0400

Please rewrite to Statalist using your full real name, as specified in the list FAQ which you were asked to read
when you joined.

Steve Samuels
[email protected]
On Mar 19, 2014, at 12:08 PM, T A <[email protected]> wrote:

Thank you everyone for your useful feedback. I am trying to describe
impact of X on Y. I would rather not use stepwise (especially after
reading the link). Nested regression sounds a good idea. I was
wondering if there is any model particularly suited for large
datasets? Or is it trial and error until we find the best model? I
need to clearly state the method/s I will use to analyse the data in
the analysis plan a-priori.
Is there any STATA package to split the data and do cross-validation?
Many thanks for your help.

On Wed, Mar 19, 2014 at 2:05 PM, Austin Nichols <[email protected]> wrote:
> T A <[email protected]> :
> You should begin your analysis plan by clarifying your goals.
> First, are you pursuing a classification model or trying to describe
> impacts of X on y?
> You can choose significant predictors using univariate analysis first,
> but you will introduce bias.
> You can pick a model that predicts best in sample, but there is no
> guarantee it will work out of sample, or that it will measure any
> causal connections between variables.
> 
> 
> On Wed, Mar 19, 2014 at 9:27 AM, Nick Cox <[email protected]> wrote:
>> Thanks for the mention of -allpossible- (SSC), but some warnings are in order.
>> 
>> That program really is limited to 6 predictors. As of 2014, I don't
>> imagine ever revising it. In the OP's case 20 predictors mean 2^20
>> possible models and that's a million and more to think about.
>> 
>> A paragraph in the help file really does mean what it says
>> 
>> "Naturally, this command does not purport to replace the detailed
>> scrutiny of individual models or to offer an unproblematic way of
>> finding
>> "best" models. Its main use may lie in demonstrating that several
>> models exist within many projects possessing roughly equal merit as
>> measured by omnibus statistics."
>> 
>> 6 by the way was not an arbitrary choice for me as programmer. A
>> former graduate student had 6 predictors, all on the same footing, and
>> looking at _all_ the 64 possible models was reasonable and natural for
>> that project. But 6 is an arbitrary limit for everyone else.
>> 
>> For exploration of different predictor sets, -tuples- (SSC) may be of
>> some help, but all it does is put tuples of variable names into local
>> macros.
>> 
>> 
>> Nick
>> [email protected]
>> 
>> 
>> On 19 March 2014 13:47, Richard Williams <[email protected]> wrote:
>>> Ideally you have some great theory which helps you pick predictors. You then
>>> test whether the theory seems to be right. The -nestreg- command can let you
>>> test a hierarchy of models.
>>> 
>>> But if you are going into this totally blind...
>>> 
>>> Check out -help stepwise- for info on how to do stepwise regression. But
>>> first, read this brief discussion of the problems with stepwise:
>>> 
>>> http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
>>> 
>>> If you want to do stepwise anyway, you may want to do things like split the
>>> sample randomly in two. Develop your model with one data set and then see if
>>> you can confirm it with the other.
>>> 
>>> If you want to mass produce models, check out Nick Cox's -allpossible-,
>>> available from SSC.
>>> 
>>> To get BIC and AIC tests, you can use commands like
>>> 
>>> sysuse auto
>>> logit foreign weight
>>> estat ic
>>> est store m1
>>> logit foreign weight mpg
>>> est store m2
>>> lrtest m1 m2, stats
>>> 
>>> You might also check out this Stata tip:
>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0032
>>> 
>>> As for searching previous questions, the search info appears at the end of
>>> every email that gets posted to the list.
>>> 
>>> At 05:51 AM 3/19/2014, T A wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am writing an analysis plan for a very large dataset. My outcome is
>>>> binary. I have data on 10,000 patients. I need to comment on which
>>>> logistic regression model I would use, i.e. forward elimination,
>>>> backward elimination, stepwise etc. How do I go about choosing the
>>>> best logistic regression model? I know I can choose significant
>>>> predictors using univariate analysis first. Since the dataset is so
>>>> large and there are only 20 variables to look at, I think all
>>>> variables could have a singificant p value. Is there a more systematic
>>>> and stringent way of choosing predictors for a multivariable logistic
>>>> regression? How do I do AIC and BIC in STATA?
>>>> 
>>>> Sorry if this is a silly question. I am a newbie to stats. Thank you
>>>> so much for your help.
>>>> 
>>>> How do I search all the previous questions that has been asked on this
>>>> mailing list?
>>>> 
>>>> Best Regards
>>>> Ta
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index