Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: logistic regression predictors

From	lilian tesmann <[email protected]>
To	<[email protected]>
Subject	RE: st: logistic regression predictors
Date	Tue, 20 Jul 2010 09:47:28 +1030

Thank you Steve for the idea and useful instructions.
This sounds like a promising way to go.
 
Lilian

> From: [email protected]
> To: [email protected]
> Subject: Re: st: logistic regression predictors
> Date: Mon, 19 Jul 2010 11:19:14 -0400
> 
> --
> 
> In fact, if you feed the two-time data to -cart-, as I suggested, the 
> log rank test in -cart- (which is -stcox- with the breslow option for 
> ties), will be equivalent to the stratified mantel-haenzel test for 
> binary data. Thus -cart- will provide a defensible split for binary 
> data. This splitting algorithm not equivalent to that in the original 
> CART method; also -cart- does not prune its trees and so risks over- 
> splitting. It continues to split if there are at least enough events; 
> if there is a split which enough observations on each side of the 
> split; and if a pvalue adjusted for multiple comparisons is too 
> small. The minimum required numbers of events and observations are 
> set by the minfail() and minsize() options; the default values are 10. 
> The default pvalue is 0.05, also settable.
> 
> To calculate the error rate You have to identify the observations in 
> each final node; classify each observation according to whether the 
> proportion of events in the node is>.5 or <.5; then compute the 
> percent of correct classifications overall (also, for each node if you 
> wish, but these will not be too precise.)
> 
> Steve
> 
> 
> On Jul 18, 2010, at 11:59 AM, Steve Samuels wrote:
> 
> 
> I was wrong about the utility of -cart- and -boost- for your data.
> --boost- is not useful when the predictors are indicator variables, as
> yours seem to be. (You haven't given many details). -cart- is intended
> for failure time data, not binary data.
> 
> With a small number of predictors, you might be able to do a
> classification tree "by hand". -cart- might guide you to a possible
> tree: simply set up two times: a shorter one for deaths and a longer
> one for survivors. -cart- will show the numbers of cases and failures
> and each terminal node. The error rate will be optimistic, because
> it is measured on the same data used to form the tree. To get a more
> accurate error rate, you could also manually do a cross-validation.
> Most simply, randomly split your data into a "training" and "test"
> sets. Develop your tree on the training set, and estimate it's
> accuracy (percent correctly predicted) on your "test" set. This can
> be improved by k-fold cross-validation Randomly divide your data into
> k (say 10) sets, omit one at a time, do -cart- on the remainder and
> test the resulting prediction on the omitted set. Your estimate of
> prediction error is the average of the 10.
> 
> I also suggest that you also look at the counts, deaths, and rates for
> all combinations of your predictors. See -crp- by Nick Cox,
> downloadable from SSC.
> 
> Steve
> 
> On Sun, Jul 18, 2010 at 10:24 AM, Steve Samuels <[email protected]> 
> wrote:
>> With such a strong independently predictive group, logistic regression
>> will give poor predictions, because it assumes that all variables are
>> needed to predict for each individual. The solution is a tree-based
>> approach. The original reference is Breiman, L., J. H. Friedman, R. A.
>> Olshen, and C. J. Stone. 1984. Classiﬁcation and Regression Trees. 
>> New
>> York: Chapman & Hall/CRC. Apparent Stata solutions are -boost-
>> ("findit boost") and -cart- (from SSC). I say "apparent", because I've
>> not closely read the documentation for either. Non-commercial
>> solutions can be found in R and at
>> http://www.stat.wisc.edu/~loh/guide.html.
>>
>>
>> Steve
>>
>> --
>> Steven Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax: 206-202-4783
>>
>>
>>
>> On Sun, Jul 18, 2010 at 1:57 AM, lilian tesmann <[email protected] 
>>> wrote:
>>> Dear All,
>>>
>>> I am trying to predict mortality rates in a specific population of 
>>> clients.
>>> I encountered two problems and would be really grateful for any 
>>> insights or suggestions.
>>>
>>> (1) We have one predictor – a health condition, which is present 
>>> in only 5% of population but over70% of people with that condition 
>>> die. Not surprisingly OR is very large (from 25 to 50). The purpose 
>>> of the analysis is to obtain individual predictions, but they are 
>>> hugely influenced by this health condition. Could anyone suggest 
>>> how to deal with this problem?
>>>
>>> (2) Another problem is that in this very specific clinical 
>>> population another two health conditions, which are usually very 
>>> significant predictors of death, have OR=0.3-0.5. The result it has 
>>> on prediction is that according to my model, sicker people have a 
>>> lower risk of dying. It looks to me as a collinearity issue between 
>>> predictors and our inclusion/exclusion criteria which created this 
>>> population. What do I do in this situation? We cannot change 
>>> inclusion criteria and we have only a small number of predictors, 
>>> three of them with ‘behavior problems’.
>>>
>>
> 
> 
> 
> -- 
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax: 206-202-4783
> 
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/ 		 	   		  
_________________________________________________________________
Need a new place to live? Find it on Domain.com.au
http://clk.atdmt.com/NMN/go/157631292/direct/01/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: logistic regression predictors
  - From: lilian tesmann <[email protected]>
- Re: st: logistic regression predictors
  - From: Steve Samuels <[email protected]>
- Re: st: logistic regression predictors
  - From: Steve Samuels <[email protected]>
- Re: st: logistic regression predictors
  - From: Steven Samuels <[email protected]>

Prev by Date: Re: st: How to get -margins- give predicted probabilities faster?
Next by Date: RE: st: Re Lilian tesman- Predict mortality
Previous by thread: Re: st: logistic regression predictors
Next by thread: st: xtpcse regression
Index(es):
- Date
- Thread