Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Path to current .do file?

From	Robert Picard <[email protected]>
To	[email protected]
Subject	Re: st: Path to current .do file?
Date	Thu, 18 Jul 2013 11:34:25 -0400

You can also take a look at -project-, a program I wrote to manage my
workflow in Stata. Type in Stata's Command window:

net from http://robertpicard.com/stata

and click on -project- to see a description, help file, and install if desired.

With -project-, Stata's current directory is always aligned with the
directory that contains the currently running do-file. So do-files do
not need to specify full or relative paths to access files in the
do-file's directory. You can move a whole directory up or down within
the project's directory without having to edit file paths in do-files.
Within a do-file, -project- can return the name of the currently
running do-file as well as the path to the main project directory.
Also, -project- automatically creates log files for each do-file. Each
log file is suspended when running a nested do-file and resumed when
the nested do-file terminates.

The problem with large projects that evolve over long periods of time
is that you usually run do-files out of context because it is
impractical to rerun all do-files in the project at every run. It also
becomes more difficult to spot the effects of a change on the results
of downstream do-files. With -project-, you embed build directives in
each do-file that note which files are used and created by the
do-file. -project- remembers these dependencies and automatically
skips over do-files that haven't changed and that have no change in
their dependencies.

My biggest project so far contains 5678 files total (1.2GB ) with 1886
do-files and has been chugging along for more than 3 years. If I
change a do-file that does not affect anything downstream, then only
that do-file is run when the project is built. Sometimes I make what I
think is an innocent change and -project- rerun hundreds of do-files.

The most important feature of -project- is that it can check that ALL
results can be replicated. After a replication build, each dataset
created, each output file, each log file is checked against the
pre-build version and any difference is noted.

The are other useful features. See the package description, help file,
and demo project if interested.

Robert

On Wed, Jul 17, 2013 at 11:52 PM, James Beard <[email protected]> wrote:
> Phil -
>
> Thanks for your detailed, helpful and speedy reply.
>
> I hadn't considered that it would be better to make all the .do files
> run from the project root folder. Apart from this making the .do
> files themselves easier to understand (because they wouldn't contain
> folder references like ../../some_folder), it also ensures that they
> can be made to immediately fail if someone tries to run them from the
> wrong place (because the required sub-folders almost certainly won't
> exist).
>
> Having a master .do file would also make it easier to make sure
> everything works smoothly.
>
> Thanks again.
>
> On 17 Jul 2013 at 21:38, Phil Schumm wrote:
>
>> On Jul 17, 2013, at 8:00 PM, James Beard <[email protected]> wrote:
>> > In a Stata .do file (I'm using Stata 12 on Windows) is it possible to find out the path to the currently executing .do file?
>>
>>
>> I don't believe so.
>>
>>
>> > I'm currently setting up some rather complicated data management in Stata, which will eventually have to deal with tens of thousands of files. Normally, I would put everything in the same folder, but in this case, that would become unmanageable. So, I have different folders for different sets of files. And I want to use relative paths to access them. If I was going to be running my .do files myself, I would just know that I have to start in the right place, but I can't guarantee that the people who will run them will do that. And I don't want to hard-wire paths in my .do files because the drive letters and paths to the "root" of my folder structure on the "production" system will be different from the root on my development system. So within each .do file, I want to -cd- to the folder in which each .do file is located, so the .do file can reliable locate files in other folders.  With apologies to non-Windows users, you can do this sort of thing with "DOS" batch files, w!
 it
>
>  h!
>>   -cd/d %~dp0-, so I could provide a .bat file wrapper for each of my .do files, but this isn't an ideal solution, and wouldn't actually stop someone running one of my .do files directly from the wrong folder.
>>
>>
>> You can accomplish what you describe above without having to resort to absolute paths (which you are correct to avoid), without putting all of your files in a single directory (which, as you note, would be unmanageable), and without a do-file knowing its own location.  To do this, start by creating a directory for the project, and within it whatever subdirectory structure you wish to organize your files.  Then, whenever you refer to a file location in a do-file (e.g., reading in a dataset, writing a file, running another do-file, etc.), use a relative path from the root of the project directory.  This way, any do-file is runnable from the root of the project, and there is no reason to be changing the working directory (which should always be set to the root of the project).  Your project is self-contained and portable, and I think you'll find that the code is easier to maintain, since your working directory remains constant (e.g., you can move do-files around within the p!
 ro
>
>  j!
>>  ect and they'll continue to work).
>>
>> One more comment, since you mentioned making it easier for other people to run the do-files within your project.  If you use the strategy I described above, instructions for executing the do-files in a project become as simple as, for example,
>>
>>     1. Launch Stata
>>     2. -cd- to the root of the project
>>     3. Type the following:
>>
>>            do data_management/clean_data
>>            do data_management/build_analysis_file
>>            do analysis/summaries
>>            do analysis/fit_models
>>            do analysis/plots
>>
>> which can then be placed in a README.txt file at the root of the project.  Alternatively, you can use a single do-file at the root of the project as a pseudo-Makefile, so that users can simply type
>>
>>     do make clean_data
>>     do make analysis_file
>>     do make summaries
>>     do make models
>>     do make plots
>>
>> or even just
>>
>>     do make all
>>
>> to do everything in one step.
>>
>>
>> -- Phil
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Path to current .do file?
  - From: Jeph Herrin <[email protected]>

References:
- st: Path to current .do file?
  - From: "James Beard" <[email protected]>
- Re: st: Path to current .do file?
  - From: Phil Schumm <[email protected]>
- Re: st: Path to current .do file?
  - From: "James Beard" <[email protected]>

Prev by Date: Re: st: Stata evaluation copy
Next by Date: st: Fwd: Stata 13 ODBC problems
Previous by thread: Re: st: Path to current .do file?
Next by thread: Re: st: Path to current .do file?
Index(es):
- Date
- Thread