Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: logistic regression predictors

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: logistic regression predictors
Date	Sun, 18 Jul 2010 11:59:30 -0400

I was wrong about the utility of -cart- and -boost- for your data.
--boost- is not useful when the predictors are indicator variables, as
yours seem to be. (You haven't given many details). -cart- is intended
for failure time data, not binary data.

With a small number of predictors, you might be able to do a
classification tree "by hand".   -cart- might guide you to a possible
tree: simply set up two times: a shorter one for deaths and a longer
one for survivors. -cart- will show  the numbers of cases and failures
and each terminal node.   The error rate will be optimistic, because
it is measured on the same data used to form the tree. To get a more
accurate error rate, you could also manually do a cross-validation.
Most simply, randomly split your data into a "training" and "test"
sets.  Develop your tree on the training set, and estimate it's
accuracy (percent correctly predicted) on your "test" set.  This can
be improved by k-fold cross-validation  Randomly divide your data into
k (say 10) sets, omit one at a time, do -cart- on the remainder and
test the resulting prediction on the omitted set.  Your estimate of
prediction error is the average of the 10.

I also suggest that you also look at the counts, deaths, and rates for
 all   combinations of your predictors.  See -crp- by Nick Cox,
downloadable from SSC.

Steve

On Sun, Jul 18, 2010 at 10:24 AM, Steve Samuels <[email protected]> wrote:
> With such a strong independently predictive group, logistic regression
> will give poor predictions, because it assumes that all variables are
> needed to predict for each individual. The solution is a tree-based
> approach. The original reference is Breiman, L., J. H. Friedman, R. A.
> Olshen, and C. J. Stone. 1984. Classiﬁcation and Regression Trees. New
> York: Chapman & Hall/CRC. Apparent Stata solutions are -boost-
> ("findit boost") and -cart- (from SSC). I say "apparent", because I've
> not closely read the documentation for either. Non-commercial
> solutions can be found in R and at
> http://www.stat.wisc.edu/~loh/guide.html.
>
>
> Steve
>
> --
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>
>
>
> On Sun, Jul 18, 2010 at 1:57 AM, lilian tesmann <[email protected]> wrote:
>> Dear All,
>>
>> I am trying to predict mortality rates in a specific population of clients.
>> I encountered two problems and would be really grateful for any insights or suggestions.
>>
>> (1) We have one predictor – a health condition, which is present in only 5% of population but over70% of people with that condition die. Not surprisingly OR is very large (from 25 to 50). The purpose of the analysis is to obtain individual predictions, but they are hugely influenced by this health condition. Could anyone suggest how to deal with this problem?
>>
>> (2) Another problem is that in this very specific clinical population another two health conditions, which are usually very significant predictors of death, have OR=0.3-0.5. The result it has on prediction is that according to my model, sicker people have a lower risk of dying. It looks to me as a collinearity issue between predictors and our inclusion/exclusion criteria which created this population. What do I do in this situation? We cannot change inclusion criteria and we have only a small number of predictors, three of them with ‘behavior problems’.
>>
>



-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: logistic regression predictors
  - From: Steven Samuels <[email protected]>

References:
- st: logistic regression predictors
  - From: lilian tesmann <[email protected]>
- Re: st: logistic regression predictors
  - From: Steve Samuels <[email protected]>

Prev by Date: Re: st: logistic regression predictors
Next by Date: Re: st: st. Simultaneous Equations Model & GMM Estimation
Previous by thread: Re: st: logistic regression predictors
Next by thread: Re: st: logistic regression predictors
Index(es):
- Date
- Thread