Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Inputting arbitrary text files into Stata datasets

From   "Nick Cox" <>
To   <>
Subject   st: RE: Inputting arbitrary text files into Stata datasets
Date   Thu, 17 Oct 2002 13:43:16 +0100

Roger Newson (various extracts)

> I have recently been attempting to write a program
> -intext.ado-, which is
> intended to input an arbitrary text file (eg a Stata
> program or a .pkg
> file) into a list of generated string variables  (as many
> as necessary) in
> a new Stata dataset in memory. For instance, I might type
> intext using foobar.ado,gene(sect)
> and Stata would generate a new data set in memory, with
> string variables
> sect1, sect2, ..., containing the lines of foobar.ado, with
> the first 80
> columns in sect1, the next 80 columns in sect2, and so on,
> until there were
> enough variables to store the longest line of foobar.ado.
> (If the user then
> typed
> outfile sect* using foobar2.ado,runtogether
> then foobar2.ado would be a duplicate of foobar.ado.) This seemed a
> reasonable thing to want to do.

> However, I cannot find a
> way that works in
> Stata as we know it.

> I therefore tried a second approach, using the very useful
> -file- command,
> which StataCorp developed in 2001 in response to a request
> from Kit Baum on
> Statalist. (Thanks again to all concerned.) The problem
> there is that -file
> read- reads data from a file into a macro. As I understand
> it (correct me
> if I'm wrong), there is no way to copy the contents of a
> macro into a
> string variable without quoting the macro, as in
> file read `myfile' line
> replace sect1=`"`line'"' in `i1'
> Stata has two kinds of quotes (single and double), and
> neither kind can be
> put around an arbitrary string. For instance, if the macro
> -line- in the
> above example contains a single ` (left-prime character),
> then single
> quotes will work and double quotes will not. On the other
> hand, if the
> macro -line- contains a single " (single-quote character),
> then double
> quotes will work and single quotes will not. It is
> therefore not easy to
> decide which kind of quotes to use.

> It therefore seems to be impossible to input an arbitrary
> text file into a
> list of generated string variables.
> I don't know how easy these two suggestions would be to
> implement, or how
> much time they would divert from more pressing projects.
> However, they
> would be very useful, at least for what I want to do.

I am very curious about why you want to read columns of
program code into string variables. If I wanted to
process code or package files as text, I would do
it in a text editor or scripting language.

I think quotes are easier than you think. Compound double
quotes don't do any harm beyond adding some visual complexity
to what you read. If this is not true for you, there's a bug

The limitations of -file- match my understanding. Similar issues
arise in other contexts.

Nick Winter has recently tackled the issue of stripping
commands out of log files to produce the equivalent .do files
and he used -file- to read in logs as if they were binary
files, byte by byte.

Kit Baum and I wrote a wrapper -log2html-
to facilitate translation of SMCL files
to HTML, using -file- to read in logs line by line,
but our program does have the undocumented limitation that it
won't treat uses of local and global macros

As I understand it, the fact that Stata necessarily _interprets_
any command line (interactive or program) before it
attempts to _execute_ it, including macro substitution and
backslash substitution, is so fundamental that it would
take much more than adding some special options to allow
this. That's a cue for Stata developers to say that this
is easy and possible and on the to-do list.


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index