Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Inputting arbitrary text files into Stata datasets

From   Roger Newson <>
Subject   st: Inputting arbitrary text files into Stata datasets
Date   Thu, 17 Oct 2002 10:52:01 +0100

Fellow Statalisters (especially StataCorp). A couple of suggestions for the wish list.

I have recently been attempting to write a program -intext.ado-, which is intended to input an arbitrary text file (eg a Stata program or a .pkg file) into a list of generated string variables (as many as necessary) in a new Stata dataset in memory. For instance, I might type

intext using foobar.ado,gene(sect)

and Stata would generate a new data set in memory, with string variables sect1, sect2, ..., containing the lines of foobar.ado, with the first 80 columns in sect1, the next 80 columns in sect2, and so on, until there were enough variables to store the longest line of foobar.ado. (If the user then typed

outfile sect* using foobar2.ado,runtogether

then foobar2.ado would be a duplicate of foobar.ado.) This seemed a reasonable thing to want to do. However, I cannot find a way that works in Stata as we know it.

The first approach I tried was to use -infix- to input the text file, using commands like

infix sect1 1-80 sect2 81-160 sect3 161-240 using foobar.ado,clear

Unfortunately, -infix- trims leading and trailing blanks from the string variables sect1, sect2 and sect3, and therefore cannot read an arbitrary text file without loss of information.

I therefore tried a second approach, using the very useful -file- command, which StataCorp developed in 2001 in response to a request from Kit Baum on Statalist. (Thanks again to all concerned.) The problem there is that -file read- reads data from a file into a macro. As I understand it (correct me if I'm wrong), there is no way to copy the contents of a macro into a string variable without quoting the macro, as in

file read `myfile' line
replace sect1=`"`line'"' in `i1'

Stata has two kinds of quotes (single and double), and neither kind can be put around an arbitrary string. For instance, if the macro -line- in the above example contains a single ` (left-prime character), then single quotes will work and double quotes will not. On the other hand, if the macro -line- contains a single " (single-quote character), then double quotes will work and single quotes will not. It is therefore not easy to decide which kind of quotes to use. (Also, quoting causes the character pairs \\ , \$ and \` to be substituted by \ , $ and ` , respectively, because the backslash is used as an escape character to prevent macro expansion. However, that is a minor issue.)

It therefore seems to be impossible to input an arbitrary text file into a list of generated string variables. However, I can think of two changes to the Stata executable which might make it straightforward, and therefore might be added to the wish list.

First, the -infix- command might have a -notrim- option, which would prevent it from trimming leading and trailing blanks from string variables. The -infix- command above might then be rephrased

infix sect1 1-80 sect2 81-160 sect3 161-240 using foobar.ado,clear notrim

and then sect1, sect2 and sect3 would all have a length of 80 characters, including leading and trailing blanks.

Second, we might have string functions to copy the unaltered contents of a macro into a string variable or string macro. For instance, there might be functions -lsubstr()- and -gsubstr()- for copying substrings from local and global macros, respectively, taking the macro name as a function argument. In the above example with -read- and -replace- commands, the user might type

file read `myfile' line
replace sect1=lsubstr(line,1,80) in `i1'

and characters 1-80 of the contents of the macro -line- would be copied into an observation of the string variable sect1.

I don't know how easy these two suggestions would be to implement, or how much time they would divert from more pressing projects. However, they would be very useful, at least for what I want to do.

Best wishes


Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605

Opinions expressed are those of the author, not the institution.

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index