Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R for Stata Users


From   Fred Wolfe <fwolfe@arthritis-research.org>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: R for Stata Users
Date   Sat, 27 Feb 2010 05:57:53 -0600

Thank you, Ben. It was very useful tos ee how you did that.

Fred

On Fri, Feb 26, 2010 at 12:59 PM, Ben Dwamena <bdwamena@med.umich.edu> wrote:
> If you already have the R code (especially if you are going to use it often)  it can be written to file stored through  known adopath with your do- or ado-file using file with write option each time to use the ado/do-file. Linking this with Roger Newson's rsource will allow you to call R,  run you code and output within Stata.
> Example:
>
>
> tempname binla
> file open `binla' using "c:\ado\plus\m\midas.R", write replace text
> file write `binla' `"library(INLA)"' _n
> file write `binla' `"inla.upgrade()"' _n
> file write `binla' `"library(foreign)"' _n
> file write `binla' `"anti.logit = function(x)  return (exp(x)/(1+exp(x)))"' _n
> file write `binla' `"quantiles = c(0.025, 0.5, 0.975)"' _n
> file write `binla' `"filename = commandArgs(trailingOnly = TRUE)"' _n
> file write `binla' `"midas <- read.csv(filename, header = TRUE)"' _n
> file write `binla' `"data.frame(midas)"' _n
> file write `binla' `"formula <- Y~f(diid,model= "2diidwishart", param=c(4,1,2,0.1)) + tp + tn - 1 "' _n
> file write `binla' `"midas = inla(formula,family="binomial", data=midas, Ntrials=N,"' _n
> file write `binla' `"quantiles = quantiles, control.inla = list(strategy = "laplace", int.strategy = "grid", npoints=50),"' _n
> file write `binla' `"control.compute = list(dic=T, cpo=T, mlik=T))"' _n
>
> file write `binla' `"hyper = inla.hyperpar(midas)"' _n
>
> file write `binla' `"nq = length(quantiles)"' _n
> file write `binla' `"R = matrix(NA, 7, nq)"' _n
> file write `binla' `"colnames(R) = as.character(quantiles)"' _n
> file write `binla' `"rownames(R) = c("summary.sens", "summary.spec", "logit.sens", "logit.spec", "sigma.sens", "sigma.spec",  "rho")"' _n
>
>
> file write `binla' `"tp = midas\$marginals.fixed\$tp"' _n
> file write `binla' `"R[1,] = anti.logit(inla.qmarginal(quantiles, tp))"' _n
>
> file write `binla' `"tn = midas\$marginals.fixed\$tn"' _n
> file write `binla' `"R[2,] = anti.logit(inla.qmarginal(quantiles, tn))"' _n
>
> file write `binla' `"tpl = midas\$marginals.fixed\$tp"' _n
> file write `binla' `"R[3,] = inla.qmarginal(quantiles, tpl)"' _n
>
> file write `binla' `"tnl = midas\$marginals.fixed\$tn"' _n
> file write `binla' `"R[4,] = inla.qmarginal(quantiles, tnl)"' _n
>
> file write `binla' `"tau.1 = hyper\$marginals\$\`Precision for diid (first component)\` "' _n
> file write `binla' `"R[5,] = 1/sqrt(inla.qmarginal(quantiles, tau.1))"' _n
>
> file write `binla' `"tau.2 = hyper\$marginals\$\`Precision for diid (second component)\` "' _n
> file write `binla' `"R[6,] = 1/sqrt(inla.qmarginal(quantiles, tau.2))"' _n
>
> file write `binla' `"rho = hyper\$marginals\$\`Rho for diid\` "' _n
> file write `binla' `"R[7,] = inla.qmarginal(quantiles, rho)"' _n
> file write `binla' `"print(R)"' _n
> file write `binla' `"midares <- data.frame(R)"' _n
> file write `binla' `"write.foreign(midares,"' _n
> file write `binla' `"datafile="midares.dta","' _n
> file write `binla' `"codefile="midares.do","' _n
> file write `binla' `"package="Stata")"' _n
>
>
> file write `binla' `"fitdic<- data.frame(midas\$dic)"' _n
> file write `binla' `"print(fitdic)"' _n
> file write `binla' `"write.foreign(fitdic,"' _n
> file write `binla' `"datafile="midadic.dta","' _n
> file write `binla' `"codefile="midadic.do","' _n
> file write `binla' `"package="Stata")"' _n
>
> file write `binla' `"like<- data.frame(midas\$mlik)"' _n
> file write `binla' `"print(like)"' _n
> file write `binla' `"write.foreign(like,"' _n
> file write `binla' `"datafile="midalik.dta","' _n
> file write `binla' `"codefile="midalik.do","' _n
> file write `binla' `"package="Stata")"' _n
>
> if "`fixed'" != "" {
>
> file write `binla' `"marg.fix.tp <- data.frame(midas\$marginals.fixed\$tp)"' _n
> file write `binla' `"write.foreign(marg.fix.tp,"' _n
> file write `binla' `"datafile="mfixtp.dta","' _n
> file write `binla' `"codefile="mfixtp.do","' _n
> file write `binla' `"package="Stata")"' _n
>
>
> file write `binla' `"marg.fix.tn <- data.frame(midas\$marginals.fixed\$tn)"' _n
> file write `binla' `"write.foreign(marg.fix.tn,"' _n
> file write `binla' `"datafile="mfixtn.dta","' _n
> file write `binla' `"codefile="mfixtn.do","' _n
> file write `binla' `"package="Stata")"' _n
> }
> if "`covplot'" != "" {
>
> file write `binla' `"marg.rho <- data.frame(midas\$marginals.hyper$\`Rho for diid\`) "' _n
> file write `binla' `"write.foreign(marg.rho,"' _n
> file write `binla' `"datafile="mrho.dta","' _n
> file write `binla' `"codefile="mrho.do","' _n
> file write `binla' `"package="Stata")"' _n
>
>
> file write `binla' `"marg.hyper.tp <- data.frame(midas\$marginals.hyper\$\`Precision for diid (first component)\`) "' _n
> file write `binla' `"write.foreign(marg.hyper.tp,"' _n
> file write `binla' `"datafile="mhypertp.dta","' _n
> file write `binla' `"codefile="mhypertp.do","' _n
> file write `binla' `"package="Stata")"' _n
>
>
> file write `binla' `"marg.hyper.tn <- data.frame(midas\$marginals.hyper\$\`Precision for diid (second component)\`) "' _n
> file write `binla' `"write.foreign(marg.hyper.tn,"' _n
> file write `binla' `"datafile="mhypertn.dta","' _n
> file write `binla' `"codefile="mhypertn.do","' _n
> file write `binla' `"package="Stata")"' _n
> }
>
> if "`fitted'" != "" {
>
> file write `binla' `"linpred <- data.frame(midas\$summary.linear.predictor)"' _n
> file write `binla' `"write.foreign(linpred,"' _n
> file write `binla' `"datafile="lpred.dta","' _n
> file write `binla' `"codefile="lpred.do","' _n
> file write `binla' `"package="Stata")"' _n
>
>
> file write `binla' `"sumfit <- data.frame(midas\$summary.fitted.values)"' _n
> file write `binla' `"write.foreign(sumfit,"' _n
> file write `binla' `"datafile="sumfit.dta","' _n
> file write `binla' `"codefile="sumfit.do","' _n
> file write `binla' `"package="Stata")"' _n
> }
>
> if "`predplot'" != "" {
> file write `binla' `"cpo <- data.frame(midas\$cpo)"' _n
> file write `binla' `"write.foreign(cpo,"' _n
> file write `binla' `"datafile="cpo.dta","' _n
> file write `binla' `"codefile="cpo.do","' _n
> file write `binla' `"package="Stata")"' _n
>
>
> file write `binla' `"pit <- data.frame(midas\$pit)"' _n
> file write `binla' `"write.foreign(pit,"' _n
> file write `binla' `"datafile="pit.dta","' _n
> file write `binla' `"codefile="pit.do","' _n
> file write `binla' `"package="Stata")"'
> }
>
> file close `binla'
>
>
>
> tokenize "`varlist'", parse(" ")
>
> gen Y1 = int(`1')
> gen N1 = int(`1'+`3')
> gen N2 = int(`2'+`4')
> gen Y2 = int(`4')
>
> gen id=_n
>
> reshape long Y N, i(id)j(dis)
>
> tab dis, gen(T)
> rename tp abcd1
> rename tn abcd2
>
> rename T1 tp
> rename T2 tn
> gen diid =_n
>
> tempfile datafile
> capture outsheet Y N tp tn diid  using `datafile', replace comma /*nolabel*/
>
> capture findfile "midas.ado"
>
> local midasloc `"`r(fn)'"'
>
> _getfilename "`midasloc'"
>
> local xfile "`r(filename)'"
>
> local foundit : subinstr local midasloc `"`xfile'"' ""
>
> local foundit : subinstr local midasloc `"`xfile'"'  ""
>
> rsource using "`foundit'midas.R", lsource roptions("--slave --args `datafile'") rpath("C:\Program Files\R\R-2.10.1\bin\rterm.exe")
> nois di ""
> nois di ""
> nois di ""
> infile lower estimate upper using  "midares.dta" , automatic  clear
> gen str parameter="Sensitivity"
> replace parameter="Specificity" in 2/2
> replace parameter="logit(sens)" in 3/3
> replace parameter="Logit(spec)" in 4/4
> replace parameter="Var(logitsens)" in 5/5
> replace parameter="Var(logitspec)" in 6/6
> replace parameter="Corr(logits)" in 7/7
> format parameter %-15s
> format estimate lower  upper %3.2f
> nois list parameter estimate lower upper, sep(0) noobs table
> erase midares.dta
> erase midares.do
> nois di ""
>
>
>
>
> Ben Adarkwa Dwamena, MD
>
> Assistant Professor of  Radiology
> Division of Nuclear Medicine
> Department of Radiology
> University of Michigan Health Systems
> B1G505 UH SPC 5028
> 1500 E. Medical Center Drive
> Ann Arbor, MI  48109-5028
> 734-936-5388 Phone
> 734-936-8182 Fax
>
> bdwamena@umich.edu
>
> http://sitemaker.umich.edu/metadiagnosis
>
>
>
> Staff Physician
> D748 Nuclear Medicine Service (115),
> VA Ann Arbor Health Care System
> 2215 Fuller Road
> Ann Arbor, MI 48105
> 734-845-3775 Phone
> 734-845-3252 Fax
>
>
>
>>>> Fred Wolfe <fwolfe@arthritis-research.org> 2/26/2010 1:21 PM >>>
> I think that I agree with everything that has been said about R. I
> have the book on order, and I certainly hope that it does more than
> the blurb suggests.
>
> I have used R for their -rpart- ("CART" - recursive partitioning) and
> -randomForest- programs.  Here is an example of (probably poor) R code
> that I used (BTW, I ended up making the publication graphs in Stata!):
>
>
>   setwd("/Users/fwolfe/statdata/fibcrit")
>   sink("rmd1_r.log", split=TRUE)
>   load("/Users/fwolfe/statdata/fibcrit/mdbinshort.rdata")
>   library(rpart)
>   set.seed(9987)
>   mdbinshort$md_acrcrit <- as.factor(mdbinshort$md_acrcrit)
>   cat.rp <- rpart(md_acrcrit ~ .,data=mdbinshort,method="class",xval=100)
>   jpeg(filename = "mdbinshort1.jpeg")
>   plot(cat.rp,uniform = T)
>   text(cat.rp)
>   print(cat.rp)
>   summary(cat.rp)
>   printcp(cat.rp)
>   dev.off()
>
>   library(randomForest)
>   set.seed(4804)
>   cat.rf <- randomForest(md_acrcrit
> ~.,data=mdbinshort,importance=TRUE, proximity=TRUE)
>   print(cat.rf)
>   ## look at variable importance
>   round(importance(cat.rf),2)
>   ## Plot variable importance
>   jpeg(filename = "mdbinshort2.jpeg")
>   varImpPlot(cat.rf,main="Predictors of FM Classification")
>   dev.off()
>
> To make this work for me, I built all of the analysis files in Stata,
> and then converted them to R as in this example:
>
> // md data medium sx but not locations
>   preserve
>   keep if mdpure == 1
>   keep md_acrcrit mdrps mdpain mdfatig mdsleep mdmood mdcog somat
> mdunfresh *sx
>   save mdbinmedium, replace
>   shell "/Applications/StatTransfer9/st.command" mdbinmedium.dta
> mdbinmedium.rdata -y
>   restore
>
> // md data medium sx but not locations with rps as cat
>   preserve
>   keep if mdpure == 1
>   keep md_acrcrit rcat mdpain mdfatig mdsleep mdmood mdcog somat mdunfresh *sx
>   save mdbinmediumcat, replace
>   shell "/Applications/StatTransfer9/st.command" mdbinmediumcat.dta
> mdbinmediumcat.rdata -y
>   restore
>
> The real problem is making all of this work in R if you are not an R
> expert. I spent hours with errors and books.
>
> What I would like to see, and that is why I am posting this, is a
> Stata program that writes R specific program code. So that if I wanted
> to run -rpart-, I could run a Stata program that would create the R
> code, call R, and run the program.
>
> I think I am not expert enough to attempt this myself. Perhaps this
> list could compile that 5-10 most useful R programs and a community
> effort to build translation programs could be undertaken.
>
> Fred
>
>
>
> On Fri, Feb 26, 2010 at 11:35 AM, Airey, David C
> <david.airey@vanderbilt.edu> wrote:
>>
>> .
>>
>> Agreed. R is documented that way on purpose according to other R docs. It's the recommended style. But it would be helpful if they had a "verbose" option in their dynamically created help files, like with the Stata online help versus PDF manual help! God I love how fast I can get to helpful help in Stata. Yes, sometimes R has good vignettes, but not enough.
>>
>> All major statistical software environments are getting explicit with links to R functionality it seems. JMP (SAS) is building in R functionality to version 9 and SPSS 18  already does. I tried Roger Newson's SSC package to run R in Stata a while ago, and it worked fine. But linking to R is not the problem as Stas points out. You have to soak yourself in R head to toe until you are positively inebriated or pickled, is my guess.
>>
>> > Well, as pretty much of R documentation, it gives the minimal amount of
>> > information. [...] -Stas
>
> --
> Fred Wolfe
> National Data Bank for Rheumatic Diseases
> Wichita, Kansas
> NDB Office  +1 316 263 2125 Ext 0
> Research Office +1 316 686 9195
> fwolfe@arthritis-research.org
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Fred Wolfe
National Data Bank for Rheumatic Diseases
Wichita, Kansas
NDB Office  +1 316 263 2125 Ext 0
Research Office +1 316 686 9195
fwolfe@arthritis-research.org

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index