Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: classification & regression trees


From   "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: classification & regression trees
Date   Tue, 16 Dec 2008 12:02:51 -0800

I have also seen some studies (sorry I can't recall the authors) that suggest that CART over-fits models and provides more variables than are needed.  On the other hand, CART provides simple dichotomies multiple times.  The one heavy use I had with it led to some confused interpretations (probably mine) using both the CART program and the R implementation.  The R CART seemed to be problematic to prune and get a 'simple' model - an awful lot depended on the analyst and his/her experience with data analysis and these models.  I found the CART program to be much nicer than the R implementation, but I'm unsure of what either of those got me.

As Stas implies, be very cautious...
 

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Stas Kolenikov
Sent: Tuesday, December 16, 2008 10:01 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: classification & regression trees

All this cool non-parametric learning stuff is much better implemented
in R. The new popular methods come and go, while StataCorp (to my
understanding) is interested in good and tried methods that have solid
justifications and allow for the standard inferential procedures.
Besides most of the time the R code would come directly from the
original methodology developers, or at least their students
responsible for coding. See
http://cran.r-project.org/web/views/MachineLearning.html.

CARTs are exploratory techniques by their nature. I would recommend
against using those to draw policy conclusions, and I personally would
probably turn down a report that would be based on CARTs only (unless
there is some clear evidence that the authors know way more about
CARTs than is described in the standard HTF and BFSO books :)). CARTs
might be useful in establishing some stylized facts and tendencies,
but you would still need to have an explicit controlled experiment
looking specifically at those distinct strategies.

If you cannot see much in your data, then your sample size might have
been too small. That might be a sad conclusion to be drawn from the
data (and those rarely come cheap), but sometimes that's the only
thing you can say.

References: http://www.citeulike.org/user/ctacmo/article/801011,
http://www.citeulike.org/user/ctacmo/article/553263.

On 12/16/08, Elizabeth Mumford <mumford@pire.org> wrote:
> As of April 2007, according to the archives, Stata was still âthinkingâ
>  about whether to add a CART-like classification tree program to the
>  package.  I downloaded the "cart" module that addresses failure time
>  analyses, but I don't think that will meet my needs (or maybe I am not
>  understanding its potential).  Are there other options I am missing?  My
>  task is to examine individual strategies (20) within an overall
>  intervention to try to determine the mechanism by which the intervention
>  is effective.  (Factor analyses and mediation analyses are not shedding
>  much light on the question, and a colleague suggested SPSS's CART.)
>

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index