Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

Re: st: logistic regression predictors

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: logistic regression predictors Date Sun, 18 Jul 2010 10:24:21 -0400

```With such a strong independently predictive group, logistic regression
will give poor predictions, because it assumes that all variables are
needed to predict for each individual. The solution is a tree-based
approach. The original reference is Breiman, L., J. H. Friedman, R. A.
Olshen, and C. J. Stone. 1984. Classiﬁcation and Regression Trees. New
York: Chapman & Hall/CRC. Apparent Stata solutions are -boost-
("findit boost") and -cart- (from SSC). I say "apparent", because I've
not closely read the documentation for either. Non-commercial
solutions can be found in R and at
http://www.stat.wisc.edu/~loh/guide.html.

Steve

--
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

On Sun, Jul 18, 2010 at 1:57 AM, lilian tesmann <lilian_tes@hotmail.com> wrote:
> Dear All,
>
> I am trying to predict mortality rates in a specific population of clients.
> I encountered two problems and would be really grateful for any insights or suggestions.
>
> (1) We have one predictor – a health condition, which is present in only 5% of population but over70% of people with that condition die. Not surprisingly OR is very large (from 25 to 50). The purpose of the analysis is to obtain individual predictions, but they are hugely influenced by this health condition. Could anyone suggest how to deal with this problem?
>
> (2) Another problem is that in this very specific clinical population another two health conditions, which are usually very significant predictors of death, have OR=0.3-0.5. The result it has on prediction is that according to my model, sicker people have a lower risk of dying. It looks to me as a collinearity issue between predictors and our inclusion/exclusion criteria which created this population. What do I do in this situation? We cannot change inclusion criteria and we have only a small number of predictors, three of them with ‘behavior problems’.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```