Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: logistic regression predictors |

Date |
Mon, 19 Jul 2010 11:19:14 -0400 |

--

Steve On Jul 18, 2010, at 11:59 AM, Steve Samuels wrote: I was wrong about the utility of -cart- and -boost- for your data. --boost- is not useful when the predictors are indicator variables, as yours seem to be. (You haven't given many details). -cart- is intended for failure time data, not binary data. With a small number of predictors, you might be able to do a classification tree "by hand". -cart- might guide you to a possible tree: simply set up two times: a shorter one for deaths and a longer one for survivors. -cart- will show the numbers of cases and failures and each terminal node. The error rate will be optimistic, because it is measured on the same data used to form the tree. To get a more accurate error rate, you could also manually do a cross-validation. Most simply, randomly split your data into a "training" and "test" sets. Develop your tree on the training set, and estimate it's accuracy (percent correctly predicted) on your "test" set. This can be improved by k-fold cross-validation Randomly divide your data into k (say 10) sets, omit one at a time, do -cart- on the remainder and test the resulting prediction on the omitted set. Your estimate of prediction error is the average of the 10. I also suggest that you also look at the counts, deaths, and rates for all combinations of your predictors. See -crp- by Nick Cox, downloadable from SSC. Steve

With such a strong independently predictive group, logistic regression will give poor predictions, because it assumes that all variables are needed to predict for each individual. The solution is a tree-based approach. The original reference is Breiman, L., J. H. Friedman, R. A.Olshen, and C. J. Stone. 1984. Classiﬁcation and Regression Trees.NewYork: Chapman & Hall/CRC. Apparent Stata solutions are -boost- ("findit boost") and -cart- (from SSC). I say "apparent", because I've not closely read the documentation for either. Non-commercial solutions can be found in R and at http://www.stat.wisc.edu/~loh/guide.html. Steve -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783On Sun, Jul 18, 2010 at 1:57 AM, lilian tesmann <lilian_tes@hotmail.com> wrote:Dear All,I am trying to predict mortality rates in a specific population ofclients.I encountered two problems and would be really grateful for anyinsights or suggestions.(1) We have one predictor – a health condition, which is presentin only 5% of population but over70% of people with that conditiondie. Not surprisingly OR is very large (from 25 to 50). The purposeof the analysis is to obtain individual predictions, but they arehugely influenced by this health condition. Could anyone suggesthow to deal with this problem?(2) Another problem is that in this very specific clinicalpopulation another two health conditions, which are usually verysignificant predictors of death, have OR=0.3-0.5. The result it hason prediction is that according to my model, sicker people have alower risk of dying. It looks to me as a collinearity issue betweenpredictors and our inclusion/exclusion criteria which created thispopulation. What do I do in this situation? We cannot changeinclusion criteria and we have only a small number of predictors,three of them with ‘behavior problems’.

-- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: logistic regression predictors***From:*lilian tesmann <lilian_tes@hotmail.com>

**References**:**st: logistic regression predictors***From:*lilian tesmann <lilian_tes@hotmail.com>

**Re: st: logistic regression predictors***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: logistic regression predictors***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: calling another .do file?** - Next by Date:
**Re: st: Programming: Ranking hospitals according to admissions in a dataset with patient level data** - Previous by thread:
**Re: st: logistic regression predictors** - Next by thread:
**RE: st: logistic regression predictors** - Index(es):