Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Stata code to run R code from within Stata and return certain pieces of the results as Stata macros


From   "Salah Mahmud" <salah.mahmud@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Stata code to run R code from within Stata and return certain pieces of the results as Stata macros
Date   Fri, 30 May 2008 19:20:51 -0500

Nick Cox wrote:
> Or else you are asking for a very knowledgeable piece of software that
> knows an awful lot about R and about Stata. I can only imagine that
> anything smart enough to be as  versatile as you want would also have to
> have so many hooks to handle all the possibilities that it would be very
> difficult to write and very difficult even to use.

I think a smart solution is possible and opens the door for Stata
users to access all the cutting edge statistical facilities only
available for R. A version 1 of this Rbridge might do the following:
1. Export a subset/all the data into a csv file and construct the
necessary R code to import that data into R. This is straightforward.
Users decide what varlist to be exported, impose if and in constraints
in the usual way and Rbridge exports this data into a temp csv file.
Rbridge then creates another temp text file (source.R) and add code to
load and attach the above csv file.

2. The Stata user specifies the R library that need to be loaded-- eg
library(cmprsk) -- and the exact R code needed to perform the specific
calculations. These could be stored in an *.R file or for simple tasks
could be passed directly to Rbridge. So Rbridge could have 2 mutually
exclusive options:
i) Rcommand ("out<-crr(ftime,fstatus,cov,failcode=2) ")
ii) Rcmdfilename( "mycode.R")

Rbridge adds these commands to source.R. The user is responsible for
ensuring that the code works (eg valid logic, correct syntax, correct
variable names etc). Rbridge does not need to know anything about R.

3. The user also specifies the result object returned by R (eg the out
object in the out<-crr(ftime,fstatus,cov,failcode=2) that she would
like Rbridge to return as global macros or r() macros. R is
object-oriented so instead unlike Stata commands which return
individual pieces of information, R functions typically return an
object that contain all these pieces of information. So the object
"out" in the call "out<-crr(ftime,fstatus,cov,failcode=2) " above
stores all the info returned from calling the function crr. Things
like out$coef , out$loglik etc which some call class members.
Version 1 of Rbridge could return a specific class member (eg
out$loglik). A more advanced implementation will parse (within R) the
target object and extract all the members' names and values (contents)
into a text file which would include info on each member's name, data
type and value. The necessary commands are added to source.R. This may
include calls to custom R functions that facilitate the parsing and
extraction of the members of the result object. But as I said this is
not essential for version 1.

4. Stata runs R and pass along source.R for execution which results in
the above results text file.

4. Stata reads the text file and generates macros with the same name,
data type and values which Rbridge could also leave behind as r()
macros.

I'm sure the devil is in the details (eg there are issues with
coordinating the running of Stata and R). For instance, Stata may have
to go to sleep until R signals that the code execution is over etc.
But the above does not seem any more daunting than the average ado out
there.

The advantages are obvious. R statistical and graphical utilities
could be called from within Stata do files. For instance I could plot
a cumulative incidence curve in Stata and add a p-value that is
calculated using a test that is only available in R (e.g., Gray test).
I'm still able to use all Stata superb facilities for handling complex
time-to-event data but I could still pass a simple dataset to R with
instructions to run Gray test and return the p-value that I will then
add to my cumulative incidence plot.

This approach might be more efficient that trying to translate R code
to Stata code and definitely better than running separate R and Stata
scripts and transferring the results "manually" between the two.

Hope the above clarifies my earlier email,

Thanks,

salah mahmud
Canada
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index