Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: The Future of Statistical Computing


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: RE: The Future of Statistical Computing
Date   Fri, 23 Jan 2009 17:29:49 -0000

I find it difficult to follow Timothy's scenario of a small business
manager who wants "non-statistican friendly software" that offers him
[sic] (e.g.) a choice between a PCA and a rotated factor analysis. That
implies a knowledge of multivariate analysis that I didn't know was
typical of that market, but I may "misunderestimate" it, to quote a
recent President. 

More crucially, perhaps, I can't accept Tim's gross contrast between
graphics in Stata and graphics in R. Stata graphics currently has one
very big limitation in my view -- no serious support for contour or
perspective view of three-dimensional data, i.e. surfaces z relative to
two other variables x and y. That aside, I believe that Stata and R are
broadly comparable in their degree of graphics programmability. They are
based on very different architectures, but only a few people need care
about that  if the results are similar. 

I don't want to knock R, which is a wonderful thing, as I've said
repeatedly on this list. But I think R is currently being over-hyped in
a way that it is likely to suffer from, and that's unfortunate for R. My
impression is that the typical R user doesn't want to write graphics
programs very much more than the typical Stata user, but that's not
really here or there. Both gain because graphics _is_ programmable by
those able and willing and everybody can use whatever resulting programs
are in the public domain. 

It would be interesting to have specific suggestions of worthwhile graph
kinds easily available in R and not available in Stata. Then people
could discuss whether they agreed that the graphs were worthwhile and
whether they could be programmed in Stata. If not, there is an agenda
for StataCorp developers. 

Leland Wilkinson is no longer working for SPSS. My impression is that he
had little impact on it even while he was a Vice-President. He was
basically doing his own thing. I would expect that influence to diminish
further now he has left the company, but I yield on guessing to any one
with more information.  

Nick 
[email protected] 

Mak, Timothy

Thank you Stas and Nick for sharing these very interesting articles. 

My humble opinion is that: 

Data mining must be growing at a massive pace, and I can easily imagine
there's a great market for non-statistician friendly software that can
graphically summarize complicated data at clicks of a few buttons. For
example let's imagine a scenario where a small company wants to
investigate the buying behaviour of its clients. Let's imagine the
business is similar to a supermarket. To find out which products tend to
be bought together, he might ask the software to 'summarize it for him'.
And the software outputs a PCA graph of the first 2 components. Out also
come a dialog box 'Would you want to look at the data from another way?'
Clicking 'Yes' gives a rotated factor analysis of the same data, with
scores plot on the two axes. Another click gives a multidimensionally
scaled version of the graph. Another click gives a 3-d scatter plot.
Another click gives you a dendrogram from a cluster analysis, and so
on... The business manager merely needs to choose the graph that he
unders!
 tands, that he can communicate to whoever he needs to. He doesn't need
to care whether the assumptions of the analyses are correct. In any
case, making decisions based on the 'best' model is probably not going
to significantly improve his business performance over any other
'good-looking' model anyway. Of course the manager has to understand
that the future is always unpredictable, no matter how good your
analyses are. 

I'm describing the scenario of a small hypothetical business, but we can
imagine similar demands from the internet-using public wanting to
quickly summarize data on the internet graphically. I think Wilkinson is
making this point - there's a lot more opportunities out there in this
area. 

Of course traditional statistics will continue to have its place, and
certainly within academia, and for anyone who needs to publish some
serious results. Data mining itself grew from traditional statistics,
and will continue to learn from traditional statistical techniques. So
traditional statisticians must also try to learn from data-mining
techniques. 

So where does Stata come into all this? 

Well I can easily imagine that 10 years down the line, SPSS and many
other software will have incorporated many of the sophisticated
graphical functions described in Wilkinson's book, and all easily
accessible for a non-statistician. So long as it can still provide
reliable regression and ANOVA results, many might be attracted to it by
these amazing graphics that it is able to produce. If somebody only has
a budget for one piece of general statistical software, which one would
he choose? 

Stata must therefore keep up with the technological development on the
graphical and data-mining front. And I trust that Stata, being so very
selective on its components, would surely only choose the best features
to incorporate, rather than trying to do everything. 

However, although Nick might disagree, at present, I don't really think
that graphics is a strength in Stata. Compared to the myriads of graphs
that R can do, Stata can only do simple plots. The main impediment is
probably that Stata graphics is not programmable by most users. Could
this possibly change in the coming years? 

Mata must be a significant contribution to Stata. However, compared to
R, I think it is difficult to use. Having to switch between two
languages (and two environments) really confuses me. That'll always be
its weakness. However, I still like Stata very much, not least because
of the immensely helpful community here, and the excellent manuals and
support. As I said in an earlier post, though, I think a debug mode in
mata would be a welcome addition... 

Hope my comments are useful. 

Tim


This sort of software would have a great appeal to medium and large
companies.  

Nick Cox

Thanks to Stas for publicising this paper. My take is the opposite of
his: 
Data mining seems to me far more over-hyped than statistical software. 

I reviewed Leland's book for the Journal of Statistical Software in
2007. 
He exercised his right to reply. Both pieces are accessible at 

<http://www.jstatsoft.org/v17/b03> 

By an odd kind of symmetry, that makes me wonder whether the vendors of
competitor software will be allowed to reply in due course to Leland's
comments in this paper! 

The Stata write-up doesn't look outrageous to me. (Clearly Leland
couldn't bring himself to compliment Stata's graphics.) 
But it is behind the curve in not mentioning Mata. 

Nick 
[email protected] 

Stas Kolenikov

The recent issue of Technometrics (vol 50 (4), I've just received it)
has an extensive article with the title in the subject line by Leland
Wilkinson, an extremely smart guy at the interface of statistics and
computer science, the author of SYSTAT and "The Grammar of Graphics"
book (totally incomprehensible to me, but a delight for Vince W, I am
sure :)). The link is http://pubs.amstat.org/toc/tech/50/4. He says,
"Statisticians interested in statistical computing and its future
incarnations will have to engage in joint research with computer
scientists to continue to have an influence." Catching up has been the
situation in data mining for some while now; and it may look like
advances in computing everywhere might phase statisticians out.

There are two paragraphs about Stata (ranked eighth in revenues after
SAS, SPSS, Matlab, Minitab, Statistica, S-Plus and JMP):

"Stata was originally the product of Bill Gould and a small group of
economists from UCLA. It has grown to be a full-featured analytic
company. The distinctive appeal of the package is its expressive and
concise programming language, based on C. Stata's unusual strengths
are in discrete variable modeling, longitudinal/panel designs,
survival analysis, time series analysis, and survey statistics.

Like S-PLUS, Stata will have to deal with the growth of R in its own
field-programmable statistics and data analysis. Unlike S-PLUS,
however, Stata's peculiar strengths and language are different enough
from R to make it a viable alternative, particularly for
economists.Moreover, the Stata user community is intensely loyal, so
we should expect Stata to continue to grow at a respectable rate."

An interesting reading. Stata developers including the top SSC
contributors might want to check it out.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index