  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: R and Stata efficiency
On Jul 9, 2007, at 1:13 PM, David Airey wrote:
I was talking to a statistician on our campus about his perception  
of the relative efficiency of R compared to Stata. One of the  
things he finds annoying about Stata is having to explicitly save  
things in addition to estimation:
Examples are: in LR test, I need to save the results of a previous  
model. In drawing graphs based on the model fit, I have to save  
the coefficient matrix.  In doing further inferences on predicted  
values, I have to save the predicted values. etc. etc.
Stata commands seem to be designed to only take existant entities  
as arguments, but cannot take the results of other functions as  
arguments without explicit assignment.  In R, results of functions  
can always be used as arguments for other functions, making  
explicit assignment unnecessary if not needed.
Stata does leave behind model coefficients and such that can be  
directly accessed (without saving). But they disappear after the  
next estimation. Whereas in R, estimates are always saved in the  
way a command is issued (<-) and they remain accessible later,  
unless they are overwritten by mistake or intent later.
Stata and R have very different architectures and interfaces, and  
thus it's possible for someone very accustomed to one to feel  
uncomfortable working in the other.  Of course, it's also possible to  
use both, and to appreciate the strengths of each (just like working  
with different programming languages). That said, one can still have  
one's own preferences, and, as a bit of truth-in-advertising, I spend  
~99 percent of my statistical life in Stata.
I'm not sure I fully understand the comment above -- it seems to me  
as though multiple issues are being raised.  For example, a common  
idiom in R is to fit a model like this:
results <- lm(y ~ x)
This command places the results into an object called  
"results" (technically of class "lm"), which one can then use later  
on.  In Stata, this would look like:
reg y x
est store results
which would store the estimation results under the name "results" for  
further use.  Of course in R you can save the object "results" in a  
file, whereas in Stata (at least as of version 9) you cannot save the  
set of results in a file (though see Michael Blasnik's -estsave- and  
Ben Jann's -estwrite- and -estread- wrappers for a workaround).
Now, I suppose you might complain that the Stata example requires two  
lines of code while the R example requires only one.  Fair enough --  
you have a lot of flexibility on the command line in R.  However, the  
end result is essentially the same, since developers have complete  
control over what they return in e() just as they have control over  
how they define the object returned by an estimation command in R.
Note also that Stata does have one advantage over R here, at least  
for a particular workflow.  Suppose I want to fit a model, and then  
perform several diagnostics in serial fashion immediately afterward.   
In R, I must save the result of the model to do this; for example
results <- lm(y ~ x)
plot(fitted(results), resid(results))
cr.plots(results,"x")
...
where I'm using the cr.plots function from the car library.  In  
Stata, I don't have to save the results, as long as I don't disturb e():
reg y x
rvfplot
cprplot x
...
Perhaps a minor point, but I wanted to emphasize the fact that the  
downside to creating results objects every time you fit a model (as  
users often do in R) is that your workspace tends to fill up with  
lots of old objects, and you have to clean them out manually.  In sum:
1) saving the results of an estimation command requires an explicit  
statement in
   both Stata and R, though in R you can fit the model and save the  
results in a
   single statement, but
2) R has no concept of an "active" set of results, and therefore you  
must save
   any results you want to use in a subsequent command
Thus, you might cast the difference as a difference between making a  
bit of additional effort each time you want to save a set of results  
for comparison with those from other models versus making a bit of  
additional effort to delete the results from models you've fit  
previously.  Depending on your own wheat-to-chaff ratio (mine is  
often quite low), you can decide which you prefer.
One final comment RE use of stored results.  In R, once you have fit  
several models and stored their results you can do something like this:
plot(results)
plot(other_results)
Now, suppose you've done the same in Stata.  What I often see users  
do is the following:
est restore results
rvplot
est restore other_results
rvplot
However, this is unnecessary.  Instead, you can simply do
est for results: rvplot
est for other_results: rvfplot
Notice that the -estimates for- prefix goes a long way toward  
reducing the difference between Stata and R in the ease with which  
you can use stored results.
-- Phil
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/