Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Stata code to run R code from within Stata and return certain pieces of the results as Stata macros


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Stata code to run R code from within Stata and return certain pieces of the results as Stata macros
Date   Sat, 31 May 2008 09:52:58 -0500

On May 30, 2008, at 7:20 PM, Salah Mahmud wrote:
I think a smart solution is possible and opens the door for Stata users to access all the cutting edge statistical facilities only available for R. A version 1 of this Rbridge might do the following: 1. Export a subset/all the data into a csv file and construct the necessary R code to import that data into R.

Why go through a text file? Why not just save a temporary file (in Stata format), and read it into R with the foreign package?



I'm sure the devil is in the details (eg there are issues with coordinating the running of Stata and R). For instance, Stata may have to go to sleep until R signals that the code execution is over etc. But the above does not seem any more daunting than the average ado out there.

You're kidding, right?


The advantages are obvious. R statistical and graphical utilities could be called from within Stata do files. For instance I could plot a cumulative incidence curve in Stata and add a p-value that is calculated using a test that is only available in R (e.g., Gray test). I'm still able to use all Stata superb facilities for handling complex time-to-event data but I could still pass a simple dataset to R with instructions to run Gray test and return the p- value that I will then add to my cumulative incidence plot.

This approach might be more efficient that trying to translate R code to Stata code and definitely better than running separate R and Stata scripts and transferring the results "manually" between the two.

I'm not trying to sound negative, but since you posted this to the list, I presume you are interested in getting feedback. While it might be fun to think about a world in which one could seamlessly call R functions from within Stata (and have them act on Stata objects), trying to simulate this with a bunch of hacks would, IMHO, probably not be worth doing. However, the general goal of making it easier for Stata users to occasionally use R functions and/or packages is a good one. Currently, doing this requires:


1) getting data and/or other objects (e.g., matrices) out of Stata and into R

2) writing the R command(s) necessary to do the task

3) getting the results (in whatever form) back into Stata, if necessary


(Note that the existing command -rsource- doesn't really address 1-3, but instead, once 1-2 (and perhaps 3) are solved, facilitates workflow by permitting you to execute an R source file from Stata and capture the printed output to the screen and/or log file.)

There is, I think, quite a lot that could be profitably done to facilitate (1) and (3). It is currently pretty straightforward to open a Stata dataset in R using the foreign package, but I don't believe there's an easy way to read a collection of Mata objects (e.g., as saved by -mata matsave-) into R. Similarly, while you can also write a Stata file using foreign, as you have noted, most R results come in the form of compound objects, and there's no easy way to get these back into Stata.

One way to approach this would be to create an abstraction layer that could read both Stata and R datasets/object files, and translate between them. Python would be ideal for writing such a layer, since much of the work to interface with R has already been done (e.g., see http://www.omegahat.org/RSPython/index.html). You could then write Python methods to read both .dta files and files containing Mata objects (i.e., as created using -mata matsave- or -fopen()-). This has been on my to-do list for some time, since we do a lot with Python and it would be great to be able to pass data from Python to Stata (and vice versa) more easily.

Once this has been done, one could imagine a Stata command that automatically saved most (if not all) Stata objects in memory (i.e., the dataset, macros, Mata objects, and the contents of r(), e(), etc.) into a set of temporary files (in standard Stata formats). One could then switch to R, and access these objects through the abstraction layer. Similarly, one could then use the abstraction layer to save one or more R objects to disk in Stata formats so that they could be read back in from Stata (using standard Stata commands, which could also be wrapped for ease of use, if necessary). Alternatively, one could just finish the R session by saving the entire workspace, and then access the abstraction layer from Stata to pull objects selectively out of this workspace.

Note that this approach would not involve any interprocess communication between Stata and R, and would therefore be easily transferrable to all platforms on which both Stata and R run (since Python is easily available for all of these).

Now, Stata's complete set of data structures (i.e., variables, matrices, macros, scalars, etc.) is quite different from R's; moreover, figuring out how to move R's various types of result objects into Stata would take some serious work. For this reason, a complete implementation of an abstraction layer would take *a lot* of work, and there may be some areas that simply cannot be addressed in a practical way. Thus, if I were going to do this project, I'd start by creating an outline of what the abstraction layer might look like, and then pick just one, clearly defined area to implement first as a proof-of-concept. This would, by itself, give you some functionality, and you could then decide whether and how to begin extending it.


-- Phil

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index