Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Collaborating in stata: how to share code and control version


From   Phil Schumm <[email protected]>
To   [email protected]
Subject   Re: st: Collaborating in stata: how to share code and control version
Date   Fri, 10 Apr 2009 10:50:06 -0500

On Apr 9, 2009, at 1:50 PM, Fabrice wrote:
I'm considering options to put structure into developing shared code in Stata. Currently, we already share code by using the ADO directory mechanism, that is very fine (i.e. putting into a shared ado directory files code that everyone else can use).

However, we start encountering problems whereby two people are modifying the same file concurrently, or worse, one erasing someone's else work.

This is a typical case where a version control system is required and I wonder if anyone has anything to recommend. Note1: The environment is windows. Note2: version control for stata development should not be confused with "version control" in stata, which only alludes to the idea that Stata allows to enforce the version of the system under which the code is run.

I've looked on the web, and found hints that Emacs that would a) interface with Stata and b) provide some version control system. Yet, I wonder if that works smoothly and is worth the effort.

What is your experience on that matter? Has anyone found anything elegant (i.e. simple) to manage this under windows?


As you may be aware, there are many version control systems (VCS) out there (e.g., see http://en.wikipedia.org/wiki/List_of_revision_control_software) , and some of the best ones are open source. The hot thing in version control right now is "distributed" version control, a well-known exemplar of which is Git (http://git-scm.com/). The Linux kernel is developed in Git (which, BTW, was initially designed by Linus Torvalds for this purpose), and GitHub (http://github.com/) is developing quite a following.

Personally, we use Subversion (http://subversion.tigris.org/), which follows more of a client-server model than a distributed model. However, there are a lot of things I like about Subversion, and it suits our needs well. A lot of open-source software projects use Subversion, though some have now switched over to Git (and still others use one of the many other distributed VCSs). Subversion began life as a re-conceptualization of an earlier VCS called Concurrent Versions System (CVS). Many years ago, CVS was just about the only non-commercial system available, and it was therefore ubiquitous. In fact, many people continue to use it today. CVS had quite a few warts, and was (IMO) painful to use. Subversion addressed these issues, and, in contrast, is quite easy to use.

I don't want to start a debate over the merits of Subversion versus Git (or any other system); if you want to read more on this, Google will be happy to oblige. I will, however, share with you a couple of the reasons why Subversion works so well for us.

One reason is that it is very easy to set up and administer, and has a very small footprint. For example, Subversion comes pre-installed on OS X, is easy to install using the appropriate package installer under Linux, and a double-click installer is available for Windows. After that, all you need is

svnadmin create foo

to create a new repository called foo. The repository is then a stand- alone directory, and can be moved around and backed up just as you do with other files on your filesystem. Configuring behavior like email notifications on commits, restricting permissions, etc. are all very easy to do. And, the entire repository can be dumped to a file which can then be edited with a file editor, if necessary. Very straightforward and intuitive. At the same time, it scales very well (i.e., large repositories and/or many users).

A second reason is that Subversion is easy to use. Of course, I say this as someone who programs and used to use CVS. A better testament to its ease-of-use is the fact that we have had a lot of success in getting non-programmers (e.g., researchers, study coordinators, etc.) who have never used a VCS to use it. We typically spend about 1 hour giving an introduction/tutorial, and, after that, they're ready to go. As a result, we are able to use Subversion to manage not only our internal code, but also the data manipulation, analyses, and even administrative documents for several large research projects. This permits the researchers on those projects to collaborate in ways that they could not do otherwise, and facilitates reproducibility.

Finally, although our programmers all use Unix/Linux, most of our "users" use Windows. Fortunately, there's a wonderful Windows application (implemented as a Windows shell extension) called TortoiseSVN (http://tortoisesvn.tigris.org/) which allows Windows users to access all the features of Subversion via menu items integrated into the standard Windows contextual (i.e., right-click) menus. TortoiseSVN is probably the biggest reason for our success in getting non-programmers to use version control.

The canonical reference on Subversion is the book written by Collins- Sussman, Fitzpatrick and Pilato, which is freely available over the web (http://svnbook.red-bean.com/). Note, however, that like the Stata reference manuals, all of the examples are illustrated at the command line. Thus, Windows users might prefer to start by reading the documentation that comes with TortoiseSVN, which is excellent.

In sum, if you want, you should definitely spend some time researching a few different systems. However, if the capabilities provided by Subversion are adequate for your needs, I'd heartily recommend it. And, in case you ever decide you want to switch to Git in the future, it's easy to convert your Subversion repositories.


-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index