Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Inputting arbitrary text files into Stata datasets


From   "Nick Winter" <nwinter@policystudies.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Inputting arbitrary text files into Stata datasets
Date   Thu, 17 Oct 2002 08:47:26 -0400

I don't know what exactly you are up to, but you might be able to
pillage some code from my recently posted -log2do2.ado-, which is on
SSC.  To get around some of the issues you raise, this file reads in the
text file in binary mode, one character at a time, as unsigned bytes
instead of as strings.  Then there is some code to check if the
character in question is a single open quote (that is, if the byte
equals 96), which obviates the need for a choice of quotation marks.

This low-level approach runs into the issue that line-terminations are
different across platforms, but that is easy enough to trap as well.

I am skeptical about functions that transfer macros without
substitution, of the sort that you suggest.  My guess, without knowing
anything about Stata's internal handling of macros, is that the results
you would get from those functions might depend heavily on
context--whether or not embedded levels of macros were substituted for
in earlier processing or not.  Any program written with those functions
would be sensitive, potentially, to whether the person calling it typed 

	. myprog argument1 argument2

versus

	. local args "argument1 argument2"
	. myprog `args'

and so on.  Plus, my guess is that the need for this is pretty
specialized; so the more cumbersome -file- in binary mode does give you
a way around things.

Allowing -infix- not to trim leading/trailing blanks has some merit, I
would think, however.

--Nick Winter


-----------------------------------------------------------
 Nicholas Winter, Ph.D.                     P 202.939.5343
 Policy Studies Associates                  F 202.939.5732
 1718 Connecticut Avenue, NW     nwinter@policystudies.com
 Washington, DC 20009-1148           www.policystudies.com
----------------------------------------------------------- 

> -----Original Message-----
> From: Roger Newson [mailto:roger.newson@kcl.ac.uk] 
> Sent: Thursday, October 17, 2002 5:52 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Inputting arbitrary text files into Stata datasets
> 
> 
> Fellow Statalisters (especially StataCorp). A couple of 
> suggestions for the 
> wish list.
> 
> I have recently been attempting to write a program 
> -intext.ado-, which is 
> intended to input an arbitrary text file (eg a Stata program 
> or a .pkg 
> file) into a list of generated string variables  (as many as 
> necessary) in 
> a new Stata dataset in memory. For instance, I might type
> 
> intext using foobar.ado,gene(sect)
> 
> and Stata would generate a new data set in memory, with 
> string variables 
> sect1, sect2, ..., containing the lines of foobar.ado, with 
> the first 80 
> columns in sect1, the next 80 columns in sect2, and so on, 
> until there were 
> enough variables to store the longest line of foobar.ado. (If 
> the user then 
> typed
> 
> outfile sect* using foobar2.ado,runtogether
> 
> then foobar2.ado would be a duplicate of foobar.ado.) This seemed a 
> reasonable thing to want to do. However, I cannot find a way 
> that works in 
> Stata as we know it.
> 
> The first approach I tried was to use -infix- to input the 
> text file, using 
> commands like
> 
> infix sect1 1-80 sect2 81-160 sect3 161-240 using foobar.ado,clear
> 
> Unfortunately, -infix- trims leading and trailing blanks from 
> the string 
> variables sect1, sect2 and sect3, and therefore cannot read 
> an arbitrary 
> text file without loss of information.
> 
> I therefore tried a second approach, using the very useful 
> -file- command, 
> which StataCorp developed in 2001 in response to a request 
> from Kit Baum on 
> Statalist. (Thanks again to all concerned.) The problem there 
> is that -file 
> read- reads data from a file into a macro. As I understand it 
> (correct me 
> if I'm wrong), there is no way to copy the contents of a macro into a 
> string variable without quoting the macro, as in
> 
> file read `myfile' line
> replace sect1=`"`line'"' in `i1'
> 
> Stata has two kinds of quotes (single and double), and 
> neither kind can be 
> put around an arbitrary string. For instance, if the macro 
> -line- in the 
> above example contains a single ` (left-prime character), then single 
> quotes will work and double quotes will not. On the other 
> hand, if the 
> macro -line- contains a single " (single-quote character), 
> then double 
> quotes will work and single quotes will not. It is therefore 
> not easy to 
> decide which kind of quotes to use. (Also, quoting causes the 
> character 
> pairs \\ , \$ and \` to be substituted by \ , $ and ` , respectively, 
> because the backslash is used as an escape character to prevent macro 
> expansion. However, that is a minor issue.)
> 
> It therefore seems to be impossible to input an arbitrary 
> text file into a 
> list of generated string variables. However, I can think of 
> two changes to 
> the Stata executable which might make it straightforward, and 
> therefore 
> might be added to the wish list.
> 
> First, the -infix- command might have a -notrim- option, which would 
> prevent it from trimming leading and trailing blanks from 
> string variables. 
> The -infix- command above might then be rephrased
> 
> infix sect1 1-80 sect2 81-160 sect3 161-240 using 
> foobar.ado,clear notrim
> 
> and then sect1, sect2 and sect3 would all have a length of 80 
> characters, 
> including leading and trailing blanks.
> 
> Second, we might have string functions to copy the unaltered 
> contents of a 
> macro into a string variable or string macro. For instance, 
> there might be 
> functions -lsubstr()- and -gsubstr()- for copying substrings 
> from local and 
> global macros, respectively, taking the macro name as a 
> function argument. 
> In the above example with -read- and -replace- commands, the 
> user might type
> 
> file read `myfile' line
> replace sect1=lsubstr(line,1,80) in `i1'
> 
> and characters 1-80 of the contents of the macro -line- would 
> be copied 
> into an observation of the string variable sect1.
> 
> I don't know how easy these two suggestions would be to 
> implement, or how 
> much time they would divert from more pressing projects. 
> However, they 
> would be very useful, at least for what I want to do.
> 
> Best wishes
> 
> Roger
> 
> --
> Roger Newson
> Lecturer in Medical Statistics
> Department of Public Health Sciences
> King's College London
> 5th Floor, Capital House
> 42 Weston Street
> London SE1 3QD
> United Kingdom
> 
> Tel: 020 7848 6648 International +44 20 7848 6648
> Fax: 020 7848 6620 International +44 20 7848 6620
>    or 020 7848 6605 International +44 20 7848 6605
> Email: roger.newson@kcl.ac.uk
> 
> Opinions expressed are those of the author, not the institution.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index