Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: R for Stata Users
"Muenchen, Robert A (Bob)" <email@example.com>
RE: st: R for Stata Users
Sun, 28 Feb 2010 13:30:13 -0500
As Joe mentioned, our goal for this book was to help make it as painless
as possible (no small feat!) for Stata users to learn to use enough R to
enable them to do things that Stata doesn't yet do. When I learned R,
the toughest time I had was with its terminology and translating what I
knew about other stat packages into the rather alien world of R. We hope
that by explaining R using Stata terminology, it will be easier to
For example, R has several data structures, one of which is a "data
frame." For a beginning understanding of that, it's essentially a Stata
data set. However, eventually we need a more precise R-based definition:
it's a list of vectors of the same length. But what's a list? How does R
define length? With or without missing values? Without a doubt, the
hardest thing about writing this book was choosing the order in which to
introduce R's terminology; it is hard to define one term without using a
couple of others!
David Airey raised an excellent question: if you need R to do the odd
selection of things Stata does not yet do, why bother to write a book
that focuses on the basics (data management, graphics, stats)? You
already know how to do those things in Stata, and you're likely to
continue to do so! The answer is that it's hard to know where to draw
the line for a particular task. For example, to do a model in R, you
might so something like this:
library("Hmisc") # Contains stata.get function.
library("OtherLibrariesYouNeed") # If you need any.
mydata <- stata.get("mydata.dta") # imports your Stata file
mymodel <- TheFunctionYouNeed( y ~ x1+x2, data=mydata )
Now that you've done your model, you're likely to want to do some
diagnostic plots. All Stata users know that it offers superb
publication-quality plots, so you could pop the data back to Stata and
do them there. Or perhaps you might want to stay in R and run this:
That will get you several nice diagnostic plots, but it brings up whole
new area. How much do you want to do in R before returning to Stata?
Now, when you're done modeling, you'll probably want to do the plots you
want to publish in Stata. That'll save you the effort learning to
control titles, etc. in R.
There are similar decisions to make in every analysis. There are simple
things R will do for you to save you hopping back and forth.
I was greatly amused by the R quote from Elan Cohen, "It's easy to do
difficult things but hard to do simple things." There are cases where
that is indeed true. We actually spend an entire chapter on the simple
act of selecting variables! Some R users would argue that since you use
the same methods selecting variables that you use when selecting
observations that it's not much added effort for a more flexible
approach. While we mention that in the book, I still prefer Stata's much
simpler approach. How much flexibility do you need for this simple task?
On the other hand, it is often R's help files that make it appear that
simple things are difficult. For example, the command to print a file is
described in the help file as:
"print prints its argument and returns it invisibly (via invisible(x)).
It is a generic function which
means that new printing methods can be easily added for new classes."
What's that invisible thing about? Does it display the data or not?
However, using it is as simple as:
The R help file on the simple act of sorting is, to me, fairly
incomprehensible. But to do it is simple using R's "order" function.
There is a "sort" function, but it doesn't do what a Stata user would
think. The differences in terminology can be very confusing at first!
It has been a pleasure working on this book with a pro like Joe Hilbe. I
hope that we have succeeded in our goal of making R easier to learn for
Stata users. If you're curious, the Table of Contents and general
coverage is very similar to the one for "R for SAS and SPSS Users" that
you can see at:
You can also download all our R and Stata programs from
http://r4stats.com and see how they compare. Keep in mind though that we
often do far more in the R programs than we do in Stata, so when the
programs are longer, that doesn't mean R was more complex on that
subject. But for selecting variables, the Stata program was shorter for
a very good reason: it's much easier.
Bob Muenchen (pronounced Min'-chen), Manager
Research Computing Support
Voice: (865) 974-5230
> -----Original Message-----
> From: firstname.lastname@example.org [mailto:owner-
> email@example.com] On Behalf Of Airey, David C
> Sent: Friday, February 26, 2010 9:43 AM
> To: firstname.lastname@example.org
> Subject: re: st: R for Stata Users
> Scott Merryman posted 1/14/10 with no replies:
> > R for Stata Users
> > Series: Statistics and Computing
> > Muenchen, Robert A., Hilbe, Joseph M.
> > 2010, Approx. 565 p., Hardcover
> > ISBN: 978-1-4419-1317-3
> > Not yet published. Available: March 21, 2010
> > http://www.springer.com/statistics/computational/book/978-1-4419-
> Apparently Stata has "jargon", and R has "formal" terminology. R also
> has "publication quality graphs". Why does everyone keep saying this
> if these are somehow unavailable in other packages? It is easier to
> make a crappy graph in R than Stata in my hands at least. Also, if I
> were a professional Stata user, why the hell would I want to know how
> to do basic statistics and basic graphs in R? There are now 900 books
> describing such. Rather I might want to know how to use R to extend
> Stata with packages unavailable in Stata and outside my expertise to
> so, period.
> Maybe one of the authors can comment on the title.
> * Read data from various types of text files and Stata data sets
> * Manage your data through transformations, recodes, and
> combining data sets from both the add-cases and add-variables
> approaches and restructuring data from wide to long formats and vice
> * Create publication quality graphs including bar, histogram,
> pie, line, scatter, regression, box, error bar, and interaction plots
> * Perform the basic types of analyses to measure strength of
> association and group differences and be able to know where to turn to
> cover much more complex methods
> Stata is the most flexible and extensible data analysis package
> available from a commercial vendor. R is a similarly flexible free and
> open source package for data analysis, with over 3,000 add-on packages
> available. This book shows you how to extend the power of Stata
> the use of R. It introduces R using Stata terminology with which you
> are already familiar. It steps through more than 30 programs written
> both languages, comparing and contrasting the two packages' different
> approaches. When finished, you will be able to use R in conjunction
> with Stata, or separately, to import data, manage and transform it,
> create publication quality graphics, and perform basic statistical
> A glossary defines over 50 R terms using Stata jargon and again using
> more formal R terminology. The table of contents and index allow you
> find equivalent R functions by looking up Stata commands and vice
> versa. The example programs and practice datasets for both R and Stata
> are available for download.
> Robert A. Muenchen is the author of the book, R for SAS and SPSS
> and is a consulting statistician with 29 years of experience. He has
> served on the advisory boards of SAS Institute, SPSS Inc., and the
> Statistical Graphics Corporation. He currently manages Research
> Computing Support at The University of Tennessee.
> Joseph M. Hilbe is Solar System Ambassador with NASA/Jet Propulsion
> Laboratory, California Institute of Technology, an adjunct professor
> statistics at Arizona State, and emeritus professor at the University
> of Hawaii. He is a Fellow of the American Statistical Association and
> elected member of the International Statistical Institute. Hilbe was
> the first editor of the Stata Technical Bulletin, (later named the
> Stata Journal) and is author of a number of textbooks, including
> Logistic Regression Models and Negative Binomial Regression.
> Written for > Professional/practitioner
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: