Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Data Mining? (was Re: RE: statalist-digest V4 #2490)

From   "Michael Blasnik" <>
To   <>
Subject   st: Data Mining? (was Re: RE: statalist-digest V4 #2490)
Date   Mon, 16 Oct 2006 19:30:40 -0400

I think that you are mistaking the "business" market with data mining. Certainly large corporate database data mining is not what Stata was designed for, but that doesn't mean there aren't many more real world applications for Stata that don't involve mining 100GB datasets. Data mining is a small segment of the data analysis market -- that's probably why the software costs many times more than Stata. I think the vast majority of people who need to analyze data will find that Stata can do everything they need to do. I guess baseball team managers are just a market segment that Stata may have to forgo while it works away in those obscure little markets like biostatistics/epi, econometrics, and other conventional statistical and data analysis tasks.

By the way, in my experience, "business people are not smart enough to figure it out." I find more people doing data analysis using Excel than anything else.

Michael Blasnik

----- Original Message ----- From: "roy wada" <>
To: <>
Sent: Monday, October 16, 2006 6:11 PM
Subject: st: RE: statalist-digest V4 #2490

The implication that the business people are consistently and persistently making a mistake by not using Stata is not credible. Information leaks everywhere. The important thing is what people actually do with the leaked information.

The recent spread of data mining is a case in point. Michael Lewis's book, Moneyball, was heavily criticized for the view that the baseball roster was inefficient. To make the story short, after the book came out in 2003, the statistical approach at the Oakland As has been hired away or replicated by other teams within the year. The Red Sox fans should thank Michael Lewis for their 2004 season, including the World Series.

The information about Stata has leaked long ago. What did the business people do with the leaked information? Structural equations are rarely used in data mining. The unbiasedness of esimators or the robustness of standard errors doen't matter, either. You just want your data handled properly. A business manager with 100gb of data is not going to settle for 1gb of analysis at a time.

Stata is meant for researchers, but not because the business people are not smart enough to figure it out.

I am not sure if Stata can penetrate the business market without a signficant changes in how it handles memory. It's probably not a coincidence that SAS followed the development of harddrives in 1960s and Stata followed the spread of integrated RAM in 1980s.

Strategic positions have been taken, which are highly defensible, but you can't easily invade the other side either. Changes would be imposed if someone invents a memory that is both fast (RAM) and permanent (harddrive), but that seems unlikely in the near future. There is no doubt that Stata has been adversely affected by the memory management under the 32-bit Windows (the 1.4gb limit). You can blame it on the othe Bill G.


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index