[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Data Mining? (was Re: RE: statalist-digest V4 #2490)
I think that you are mistaking the "business" market with data mining.
Certainly large corporate database data mining is not what Stata was
designed for, but that doesn't mean there aren't many more real world
applications for Stata that don't involve mining 100GB datasets. Data
mining is a small segment of the data analysis market -- that's probably
why the software costs many times more than Stata. I think the vast
majority of people who need to analyze data will find that Stata can do
everything they need to do. I guess baseball team managers are just a
market segment that Stata may have to forgo while it works away in those
obscure little markets like biostatistics/epi, econometrics, and other
conventional statistical and data analysis tasks.
By the way, in my experience, "business people are not smart enough to
figure it out." I find more people doing data analysis using Excel than
----- Original Message -----
From: "roy wada" <email@example.com>
Sent: Monday, October 16, 2006 6:11 PM
Subject: st: RE: statalist-digest V4 #2490
The implication that the business people are consistently and persistently
making a mistake by not using Stata is not credible. Information leaks
everywhere. The important thing is what people actually do with the leaked
The recent spread of data mining is a case in point. Michael Lewis's book,
Moneyball, was heavily criticized for the view that the baseball roster
was inefficient. To make the story short, after the book came out in 2003,
the statistical approach at the Oakland As has been hired away or
replicated by other teams within the year. The Red Sox fans should thank
Michael Lewis for their 2004 season, including the World Series.
The information about Stata has leaked long ago. What did the business
people do with the leaked information? Structural equations are rarely
used in data mining. The unbiasedness of esimators or the robustness of
standard errors doen't matter, either. You just want your data handled
properly. A business manager with 100gb of data is not going to settle for
1gb of analysis at a time.
Stata is meant for researchers, but not because the business people are
not smart enough to figure it out.
I am not sure if Stata can penetrate the business market without a
signficant changes in how it handles memory. It's probably not a
coincidence that SAS followed the development of harddrives in 1960s and
Stata followed the spread of integrated RAM in 1980s.
Strategic positions have been taken, which are highly defensible, but you
can't easily invade the other side either. Changes would be imposed if
someone invents a memory that is both fast (RAM) and permanent
(harddrive), but that seems unlikely in the near future. There is no doubt
that Stata has been adversely affected by the memory management under the
32-bit Windows (the 1.4gb limit). You can blame it on the othe Bill G.
* For searches and help try: