|  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Collaborating in stata: how to share code and control version
| From | Phil Schumm <[email protected]> | 
| To | [email protected] | 
| Subject | Re: st: Collaborating in stata: how to share code and control version | 
| Date | Fri, 10 Apr 2009 10:50:06 -0500 | 
On Apr 9, 2009, at 1:50 PM, Fabrice wrote:
I'm considering options to put structure into developing shared code  
in Stata. Currently, we already share code by using the ADO  
directory mechanism, that is very fine (i.e. putting into a shared  
ado directory files code that everyone else can use).
However, we start encountering problems whereby two people are  
modifying the same file concurrently, or worse, one erasing  
someone's else work.
This is a typical case where a version control system is required  
and I wonder if anyone has anything to recommend. Note1: The  
environment is windows. Note2: version control for stata development  
should not be confused with "version control" in stata, which only  
alludes to the idea that Stata allows to enforce the version of the  
system under which the code is run.
I've looked on the web, and found hints that Emacs that would a)  
interface with Stata and b) provide some version control system.  
Yet, I wonder if that works smoothly and is worth the effort.
What is your experience on that matter? Has anyone found anything  
elegant (i.e. simple) to manage this under windows?
As you may be aware, there are many version control systems (VCS) out  
there (e.g., see http://en.wikipedia.org/wiki/List_of_revision_control_software) 
, and some of the best ones are open source.  The hot thing in version  
control right now is "distributed" version control, a well-known  
exemplar of which is Git (http://git-scm.com/).  The Linux kernel is  
developed in Git (which, BTW, was initially designed by Linus Torvalds  
for this purpose), and GitHub (http://github.com/) is developing quite  
a following.
Personally, we use Subversion (http://subversion.tigris.org/), which  
follows more of a client-server model than a distributed model.   
However, there are a lot of things I like about Subversion, and it  
suits our needs well.  A lot of open-source software projects use  
Subversion, though some have now switched over to Git (and still  
others use one of the many other distributed VCSs).  Subversion began  
life as a re-conceptualization of an earlier VCS called Concurrent  
Versions System (CVS).  Many years ago, CVS was just about the only  
non-commercial system available, and it was therefore ubiquitous.  In  
fact, many people continue to use it today.  CVS had quite a few  
warts, and was (IMO) painful to use.  Subversion addressed these  
issues, and, in contrast, is quite easy to use.
I don't want to start a debate over the merits of Subversion versus  
Git (or any other system); if you want to read more on this, Google  
will be happy to oblige.  I will, however, share with you a couple of  
the reasons why Subversion works so well for us.
One reason is that it is very easy to set up and administer, and has a  
very small footprint.  For example, Subversion comes pre-installed on  
OS X, is easy to install using the appropriate package installer under  
Linux, and a double-click installer is available for Windows.  After  
that, all you need is
svnadmin create foo
to create a new repository called foo.  The repository is then a stand- 
alone directory, and can be moved around and backed up just as you do  
with other files on your filesystem.  Configuring behavior like email  
notifications on commits, restricting permissions, etc. are all very  
easy to do.  And, the entire repository can be dumped to a file which  
can then be edited with a file editor, if necessary.  Very  
straightforward and intuitive.  At the same time, it scales very well  
(i.e., large repositories and/or many users).
A second reason is that Subversion is easy to use.  Of course, I say  
this as someone who programs and used to use CVS.  A better testament  
to its ease-of-use is the fact that we have had a lot of success in  
getting non-programmers (e.g., researchers, study coordinators, etc.)  
who have never used a VCS to use it.  We typically spend about 1 hour  
giving an introduction/tutorial, and, after that, they're ready to  
go.  As a result, we are able to use Subversion to manage not only our  
internal code, but also the data manipulation, analyses, and even  
administrative documents for several large research projects.  This  
permits the researchers on those projects to collaborate in ways that  
they could not do otherwise, and facilitates reproducibility.
Finally, although our programmers all use Unix/Linux, most of our  
"users" use Windows.  Fortunately, there's a wonderful Windows  
application (implemented as a Windows shell extension) called  
TortoiseSVN (http://tortoisesvn.tigris.org/) which allows Windows  
users to access all the features of Subversion via menu items  
integrated into the standard Windows contextual (i.e., right-click)  
menus.  TortoiseSVN is probably the biggest reason for our success in  
getting non-programmers to use version control.
The canonical reference on Subversion is the book written by Collins- 
Sussman, Fitzpatrick and Pilato, which is freely available over the  
web (http://svnbook.red-bean.com/).  Note, however, that like the  
Stata reference manuals, all of the examples are illustrated at the  
command line.  Thus, Windows users might prefer to start by reading  
the documentation that comes with TortoiseSVN, which is excellent.
In sum, if you want, you should definitely spend some time researching  
a few different systems.  However, if the capabilities provided by  
Subversion are adequate for your needs, I'd heartily recommend it.   
And, in case you ever decide you want to switch to Git in the future,  
it's easy to convert your Subversion repositories.
-- Phil
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/