[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: making Stata read do-files
Phil Schumm <email@example.com>
Re: st: making Stata read do-files
Wed, 16 Apr 2008 14:52:06 -0500
On Apr 16, 2008, at 2:14 PM, Gabi Huiber wrote:
The larger problem I was trying to solve was this: go through a
mess of directory paths and eventually find a load of do-files in
each of them, saved weekly -- sometimes with names such as
fileYYYYMMDD.do, and other times with names such as fileapr1608.do.
Then read each of those do-files line by line, but don't interpret
them. Instead, write each line to a .dta file as an observation in
a variable called cmd (as in command). Next to it, write the date
of the file that that line came from, in the format YYYYMMDD
(because it reads well to the human eye and sorts chronologically),
as the corresponding observation in a variable called date. Then
drop duplicates in terms of cmd.
The goal is twofold: I want to easily track changes made to the do- files over time, and I want to use these dta files to make Stata write its own do-files on the fly. If, for example, I want to reconstitute the weekly do-file saved on 20071231, I just keep all the observations in the dta file where date<=20071231. As time goes by and people keep saving these weekly do-files, I just send Stata to scrape the directories anew and re-assemble the master dta file.
I did not want to mess with the do-file names because other people still use them and I wanted to do my work with as little disruption to them as possible.
Of course had my client used some kind of proper revision control system, like RCS in Unix, this effort would have been unnecessary. How do the Statalisters deal with revision control? Is there a Stata-specific good practices write-up on the matter? Might somebody present one at the Chicago meeting?
What you describe above sounds like reinventing a Version Control System (VCS), and, as you noted, it would make *much* more sense just to use one of the many existing systems. There are now many excellent and easy to use systems freely available (as well as many commercial systems, of course).
To give you some background, in our shop we split our time between data collection and management (including managing the public releases of several large datasets) and statistical analyses. All of this work is stored in a VCS (we currently use and strongly recommend Subversion), and we absolutely could not function without it (at least I would not want to consider such a scenario). A VCS is usually pretty agnostic WRT what type of code you are storing in it, but, as you might guess, we have developed several tricks to facilitate the type of work we do, and to facilitate use of Subversion with Stata (we also store non-Stata code in the repository).
You might also be interested to know that we have some experience in training "non-technical" users to use our repositories (by non- technical here I am referring primarily to data analysts who range in their ability to use a package like Stata but are definitely not computer programmers). I'm not going to suggest that this is always easy (it isn't), but, under the right circumstances, we have evidence that it can work.
I believe I brought this topic up in conversation at a users' meeting a few years ago, and no one seemed interested. Thus, I don't know whether an actual presentation on this would be appropriate at a Stata users' meeting (plus, as I noted, most of the issues involved are entirely general and not Stata-specific). But, as luck would have it, I will be at the meeting in Chicago this summer, and would be glad to sit down with you (and anyone else who's interested) and share our thoughts and experiences WRT this.
* For searches and help try: