Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Using version control software with Stata


From   Phil Schumm <[email protected]>
To   Statalist Statalist <[email protected]>
Subject   Re: st: Using version control software with Stata
Date   Mon, 31 Mar 2014 09:34:14 -0500

On Mar 31, 2014, at 3:51 AM, Timothy Mak <[email protected]> wrote:
> Thank you everyone for your comments. The message seems to be, if I were to use a version control software, Git or Mercurial would be the way to go.


IMO, yes.  As I said, there is a lot of conceptual overlap between the two, as well as overlap in specific functionality (and even syntax).  Thus, much of what you learn about one will be transferrable to the other.  I'd probably suggest starting with Git (since that currently has a bit more traction among data analysts), and if you find that too burdensome, then try Mercurial.

Note that if you Google "Git versus Mercurial", you will find *a lot* of information -- some good but a lot misleading (or at least confusing).  Tread carefully.


> 1. It seems to me what a Version Control System (VCS) does is that it automatically keeps a record of all changes to my file, numbering them sequentially, perhaps every time I save it. This record would be kept in a separate file (or files) to the current (most recent) do file.


A VCS distinguishes between a "working copy" of your project (i.e., the state of your project files as you are currently working on them) and a "repository", in which is stored a series of "snapshots" of your project (in a very efficient manner).  Although a VCS can be configured to create a new snapshot with every save, the standard way of working is to make regular "checkins" to the repository (i.e., storing a snapshot together with an informative message you write) as you work on the project.  You can then subsequently restore the working copy to exactly the state it was in as of any snapshot.

While being able to restore to any previous snapshot is useful, this can also be done (albeit in an inefficient manner) by simply making copies of your project directory and storing them on the filesystem.  Instead, the real value of a VCS is in the tools it provides to navigate back through your project history, to organize your work by tracking multiple lines of project development, and to collaborate with others.  For example, in a large project that has persisted over several years, it is often useful to be able to answer questions like "When did a specific file change, who made the change, and why?"  Or, if two people have been working separately on a project (or, equivalently, one person has worked on a project at separate times), it can be useful to be able to merge their changes together.  These are the types of tasks that a VCS is designed to handle.


> 2. However, say I want to send the most recent do-file to someone, but I want to tag it as version 0.1.123, e.g. as part of the preamble in the do-file. I guess I won't be able to do this automatically if I use Stata's do-file editor, right? I'll still have to manually type in "* Version 0.1.123" in the first line. Or perhaps the VCS can automatically add this line to the top of the file every time it is saved, even if I just press Ctrl-S in the do-file editor?


You can configure a VCS to add version markers like this automatically if you want to, but a modern, distributed VCS makes such things unnecessary.  Instead, you send a colleague a snapshot of your project as of a specific revision, refer them to the web (assuming you have placed a copy of your repository on Github, Bitbucket, or similar), or let them "fork" your repository.


> 3. Another thing is when one do-file references another. Currently my practice is to have a "V#" added to the end of the name of nearly all my do-files, e.g. "mydofileV1.do". Thus all references to do-files have to indicate the version number also. However, if I do use a VCS, where the previous versions are not stored as separate files, presumably I'll have to adopt a different system. And because the do-file editor is not really a developer's platform, I struggle to see how this can be easily done. 
> 
> 4. With ado-files, I cannot even implement my system above, since if my program is -myado-, I must name it "myado.ado", and not "myadoV1.ado". Therefore, at the moment, I often write my ado-files as do files, and -run "myadoV1.do"-, before using -myado ...-. 


I'm not exactly sure what you're describing here, but again, with a VCS such manual dependency tracking is not necessary.  Instead, rather than reverting specific file(s), you simply checkout an entire revision from the repository, and all of your files are guaranteed to be consistent.  And if you need to link together dependencies from separate projects (i.e., the files for Project 1 depend on specific versions of file(s) from Project 2), you can do that too.


> 5. At the same time, I wonder how professional programmers cope with this problem.


They use a VCS :)


> 6. Ideally, I'll have a system where every time I save a do-file, it will also save in its preamble the version number of all the do-files and ado-files that it refers to. Even better is that it will automatically find the appropriate version of that program/function every time I run the do file also. I know this is probably well beyond what can be practically achieved.


No, that can be achieved, but as I wrote above, not by repeatedly munging the file.  Instead, you let the VCS do the work for you.


-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index