[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: R and Stata efficiency

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: R and Stata efficiency
Date	Tue, 10 Jul 2007 06:51:06 -0500

On Jul 9, 2007, at 1:13 PM, David Airey wrote:

I was talking to a statistician on our campus about his perception of the relative efficiency of R compared to Stata. One of the things he finds annoying about Stata is having to explicitly save things in addition to estimation:

Examples are: in LR test, I need to save the results of a previous model. In drawing graphs based on the model fit, I have to save the coefficient matrix. In doing further inferences on predicted values, I have to save the predicted values. etc. etc.

Stata commands seem to be designed to only take existant entities as arguments, but cannot take the results of other functions as arguments without explicit assignment. In R, results of functions can always be used as arguments for other functions, making explicit assignment unnecessary if not needed.

Stata does leave behind model coefficients and such that can be directly accessed (without saving). But they disappear after the next estimation. Whereas in R, estimates are always saved in the way a command is issued (<-) and they remain accessible later, unless they are overwritten by mistake or intent later.

Stata and R have very different architectures and interfaces, and thus it's possible for someone very accustomed to one to feel uncomfortable working in the other. Of course, it's also possible to use both, and to appreciate the strengths of each (just like working with different programming languages). That said, one can still have one's own preferences, and, as a bit of truth-in-advertising, I spend ~99 percent of my statistical life in Stata.

I'm not sure I fully understand the comment above -- it seems to me as though multiple issues are being raised. For example, a common idiom in R is to fit a model like this:

results <- lm(y ~ x)

This command places the results into an object called "results" (technically of class "lm"), which one can then use later on. In Stata, this would look like:

reg y x
est store results

which would store the estimation results under the name "results" for further use. Of course in R you can save the object "results" in a file, whereas in Stata (at least as of version 9) you cannot save the set of results in a file (though see Michael Blasnik's -estsave- and Ben Jann's -estwrite- and -estread- wrappers for a workaround).

Now, I suppose you might complain that the Stata example requires two lines of code while the R example requires only one. Fair enough -- you have a lot of flexibility on the command line in R. However, the end result is essentially the same, since developers have complete control over what they return in e() just as they have control over how they define the object returned by an estimation command in R.

Note also that Stata does have one advantage over R here, at least for a particular workflow. Suppose I want to fit a model, and then perform several diagnostics in serial fashion immediately afterward. In R, I must save the result of the model to do this; for example

results <- lm(y ~ x)
plot(fitted(results), resid(results))
cr.plots(results,"x")
...

where I'm using the cr.plots function from the car library. In Stata, I don't have to save the results, as long as I don't disturb e():

reg y x
rvfplot
cprplot x
...

Perhaps a minor point, but I wanted to emphasize the fact that the downside to creating results objects every time you fit a model (as users often do in R) is that your workspace tends to fill up with lots of old objects, and you have to clean them out manually. In sum:

1) saving the results of an estimation command requires an explicit statement in
both Stata and R, though in R you can fit the model and save the results in a
single statement, but
2) R has no concept of an "active" set of results, and therefore you must save
any results you want to use in a subsequent command

Thus, you might cast the difference as a difference between making a bit of additional effort each time you want to save a set of results for comparison with those from other models versus making a bit of additional effort to delete the results from models you've fit previously. Depending on your own wheat-to-chaff ratio (mine is often quite low), you can decide which you prefer.

One final comment RE use of stored results. In R, once you have fit several models and stored their results you can do something like this:

plot(results)
plot(other_results)

Now, suppose you've done the same in Stata. What I often see users do is the following:

est restore results
rvplot
est restore other_results
rvplot

However, this is unnecessary. Instead, you can simply do

est for results: rvplot
est for other_results: rvfplot

Notice that the -estimates for- prefix goes a long way toward reducing the difference between Stata and R in the ease with which you can use stored results.

-- Phil

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- st: R and Stata efficiency
  - From: David Airey <[email protected]>

Prev by Date: st: Difference of means and t-test for dummy variable
Next by Date: st: RE: Difference of means and t-test for dummy variable
Previous by thread: st: R and Stata efficiency
Next by thread: re: st: R and Stata efficiency
Index(es):
- Date
- Thread