[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: Re: RE: Problem areas (R vs Stata)
Some rambling views on R vs. Stata. Read on if you're patient.
Here's why I think Stata is used widely: It has really tidy data
manipulation features and is used widely (tautology, no?). Wide use
implies people can help each other out, which reduces costs. Good
easy-to-use flexible data manipulation features reduce the need to learn
to handle a database server or use other software. Remove these things
and you start using R.
R is something you have to study to learn. Bigger overhead than getting
started with Stata. That said, Stata has its own `hump' in the learning
curve right about where you want to start doing involved programming -
this is what net courses are good for. Point is, if you want to take
advantage of _either_ Stata or R, or any other decent data analysis
environment, you have to spend time studying how to use it.
Following the R introductory guide will get you up and running with just
as many of the begginer/intermediate functions used in Stata, but the
syntax gets people I think. Compare three pairs of equivalent statements:
Stata: replace x = . if (x==0)
R: mydata$x[which(mydata$x==0)] <- na
Stata: do "mydo"
R: source("myRbatch.R", echo=TRUE)
Stata: regress y x1 x1
R: fitted.model <- lm(blah$y ~ blah$x1 + blah$x2)
And don't even ask about importing big CSV files. Note though that there
are places where R is many, many times more concise than using regular
Stata language - and also that this is a completely different story now
that Mata exists.
As for the Stata GUI vs. the R GUI debate, I've never really got this.
What Stata GUI? A few menus, a history and a variable list - that's it.
The results menu and command input certainly don't count. R handles
graphical output at least as well as Stata. I personally use ESS under
Emacs to use R, which is a slightly different kettle, but even using the
stock GUI I don't see major difference. I think it's a good thing
neither Stata nor R have much in the way of a GUI - people should use
scripts, programs and logged input for the body of their work, avoiding
pointing and clicking wherever possible.
Finally, R has very minimal data manipulation abilities. The standard
recommendation on the R-help list is that users with data import
problems and data manipulation problems should probably set up a
database server and put all their data in there, doing the manipulation
of data mostly with the server rather than with R. Now, really... Very
impractical for so many people for so many reasons.
Use R if you're poor or don't care about the differences. Use Stata if
everyone else does or you don't want to learn to run a database server.
Use either if there aren't any issues any way. If you like programming
languages learn R's; if you have Stata learn Mata. If you like
chalanges, do everything in C.
One thing about R is that it's free, which means that you can try it
yourself with just the cost of some hair and coffee. Give it a go,
really, but don't confuse the myths about R for its truths.
And finally, I have to say the R support community is bloody fantastic.
Stata's is very good, but R's is really something. So never let anyone
tell you R is poorly supported - that is absolutely false. Probably more
netiquette involved on R-help, but that's fine.
Righto, hope that wasn't too much random walking
Nick Cox wrote:
Without wanting to be insensitive to those
with incomes that mean Stata is too expensive
-- which include many people in _all_ countries
-- this is just not going to happen. Or so
If you want open source Stata, you will have to
recapitulate the development of Stata from scratch,
which means getting a team up to speed on C programming,
operating systems, low-level stuff, numerical
analysis, etc., etc. And then you have to watch
This is because StataCorp is a company based almost
totally on Stata, and they are just not going to
throw their intellectual capital out into the world.
In principle, what you want
could be done, as shown by the history of S-Plus and R.
But that history has certain unique features,
unlikely to be matched in the case of Stata.
And it would be interesting and indeed
exciting if someone did it, but I doubt it.
Turn and turn about, why is not the whole
statistical world not using R if it is free?
I guess there are several main reasons. Here
are a few:
1. The way R is set up is congenial to
its developers but not to all possible users.
This creates a feedback loop, as new code
has to fit in with existing code. R still
shows its origin in S, which is a programming
language first and foremost.
2. Many users want GUIs as well. The GUI
of R, as I understand it, is minor.
3. Many users want and indeed need technical
support. Somewhere I saw a comment from
an R developer "Our idea of technical support
is that you support us", and that's fair enough.
Naturally there are email lists etc. for R
and people do help each other. But no one
has the _duty_ to help you. For many users,
4. The inertia that comes from pre-existing
investment in people, locally-written programs,
documentation etc. to do with a particular
program that is already in use in a particular
I agree that STATA is the leader on statistical packages.
Also I understand
that it has to find a way to support itself. Nevertheless,
for some of us in
developing countries often is a burden not to be able to get
and last versions. The truth is, I believe, that STATA will make much
greater benefit that what it does if it were open source. I'm
sure that some
financial mechanism could be found to support the
infraestructure and I
almost sure that it's development is not due to copy rights,
and maybe a
faster and broader development could be reach in an open
source format. Just
think on STATA as a vaccine, it is in some way a need.
* For searches and help try:
* For searches and help try: