help infile2 dialog: infile (fixed format)
-------------------------------------------------------------------------------
Title
[D] infile (fixed format) -- Read ASCII (text) data in fixed format with
a dictionary
Syntax
infile using dfilename [if] [in] [, options]
options description
-------------------------------------------------------------------------
Main
using(filename) ASCII dataset filename
clear replace data in memory
Options
automatic create value labels from nonnumeric data
-------------------------------------------------------------------------
The syntax for a dictionary (a file created with an editor or word
processor outside Stata) is
-------------------------------------- top of dictionary file ---
[infile] dictionary [using filename] {
* comments may be included freely
_lrecl(#)
_firstlineoffile(#)
_lines(#)
_line(#)
_newline[(#)]
_column(#)
_skip[(#)]
[type] varname [:lblname] [%infmt] ["variable label"]
}
(your data might appear here)
-------------------------------------- end of dictionary file ---
where %infmt is { %[#[.#]] {f|g|e} | %[#]s | %[#]S}
Menu
File > Import > ASCII data in fixed format with a dictionary
Description
infile using reads from a disk a dataset that is not in Stata format.
infile using does this by first reading dfilename -- a "dictionary" that
describes the format of the data file -- and then reads the file
containing the data. The dictionary is a file you create in an editor or
word processor outside Stata. If dfilename is specified without an
extension, .dct is assumed.
If using filename is not specified, the data are assumed to begin on the
line following the closing brace. If using filename is specified, the
data are assumed to be located in filename. If filename is specified
without an extension, .raw is assumed.
Note for Stata for Mac and Stata for Windows users: If dfilename or
filename contains embedded spaces, remember to enclose it in double
quotes.
The data may be in the same file as the dictionary or in another file.
Another variation on infile omits the intermediate dictionary; see
infile1. This variation is easier to use but will not read fixed-format
files. On the other hand, although infile using will read free-format
files, infile without a dictionary is even better at it.
An alternative to infile using for reading fixed-format files is infix;
see [D] infix (fixed format). infix provides fewer features than infile
using but is easier to use.
Stata has other commands for reading data. If you are not certain that
infile using will do what you are looking for, see infiling and [U] 21
Inputting data.
Options
+------+
----+ Main +-------------------------------------------------------------
using(filename) specifies the name of a file containing the data. If
using() is not specified, the data are assumed to follow the
dictionary in dfilename, or if the dictionary specifies the name of
some other file, that file is assumed to contain the data. If
using(filename) is specified, filename is used to obtain the data,
even if the dictionary says otherwise. If filename is specified
without an extension, .raw is assumed.
Note for Stata for Mac and Stata for Windows users: If filename
contains embedded spaces, remember to enclose it in double quotes.
clear specifies that it is okay for the new data to replace what is
currently in memory. To ensure that you do not lose something
important, infile using will refuse to read new data if other data
are already in memory. clear allows infile using to replace the data
in memory. You can also drop the data yourself by typing drop _all
before reading new data.
+---------+
----+ Options +----------------------------------------------------------
automatic causes Stata to create value labels from the nonnumeric data it
reads. It also automatically widens the display format to fit the
longest label.
Dictionary directives
* marks comment lines. Wherever you wish to place a comment, begin the
line with a *. Comments can appear many times in the same
dictionary.
_lrecl(#) is used only for reading datasets that do not have end-of-line
delimiters (carriage return, line feed, or some combination of
these). Such files are often produced by mainframe computers and
have been poorly translated from EBCDIC into ASCII. _lrecl()
specifies the logical record length. _lrecl() requests that infile
act as if a line ends every # characters.
_lrecl() appears only once, and typically not at all, in a
dictionary.
_firstlineoffile(#) (abbreviation _first()) is also rarely specified. It
states the line of the file where the data begin. You do not need to
specify _first() when the data follow the dictionary; Stata can
figure that out for itself. However, you might specify _first() when
reading data from another file in which the first line does not
contain data because of headers or other markers.
_first() appears only once, and typically not at all, in a
dictionary.
_lines(#) states the number of lines per observation in the file. Simple
datasets typically have _lines(1). Large datasets often have many
lines (sometimes called records) per observation. _lines() is
optional, even when there is more than one line per observation
because infile can sometimes figure it out for itself. Still, if
_lines(1) is not right for your data, it is best to specify the
correct number through _lines(#).
_lines() appears only once in a dictionary.
_line(#) tells infile to jump to line # of the observation. _lines() is
not the same as _line(). Consider a file with _lines(4), meaning four
lines per observation. _line(2) says to jump to the second line of
the observation. _line(4) says to jump to the fourth line of the
observation. You may jump forward or backward. infile does not care,
and there is no inefficiency in going forward to _line(3), reading a
few variables, jumping back to _line(1), reading another variable,
and jumping forward again to _line(3).
You need not ensure that, at the end of your dictionary, you are on
the last line of the observation. infile knows how to get to the
next observation because it knows where you are and it knows
_lines(), the total number of lines per observation.
_line() may appear many times in a dictionary.
_newline[(#)] is an alternative to _line(). _newline(1), which may be
abbreviated _newline, goes forward one line. _newline(2) goes
forward two lines. We do not recommend using _newline() because
_line() is better. If you are currently on line 2 of an observation
and want to get to line 6, you could type _newline(4), but your
meaning is clearer if you type _line(6).
_newline() may appear many times in a dictionary.
_column(#) jumps to column # on the current line. You may jump forward
or backward within a line. _column() may appear many times in a
dictionary.
_skip[(#)] jumps forward # columns on the current line. _skip() is just
an alternative to _column(). _skip() may appear many times in a
dictionary.
[type] varname [:lblname}] [%infmt] ["variable label"] instructs infile
to read a variable. The simplest form of this instruction is the
variable name itself: varname.
At all times, infile is on some column of some line of an
observation. infile starts on column 1 of line 1, so pretend that is
where we are. Given the simplest directive, `varname', infile goes
through the following logic:
If the current column is blank, it skips forward until there is a
nonblank column (or until the end of the line). If it just skipped
all the way to the end of the line, it stores a missing value in
varname. If it skipped to a nonblank column, it begins collecting
what is there until it comes to a blank column or the end of the
line. These are the data for varname. Then it sets the current
column to wherever it is.
The logic is a bit more complicated. For instance, when skipping
forward to find the data, infile might encounter a quote. If so, it
then collects the characters for the data by skipping forward until
it finds the matching quote. If you specify a %infmt, then infile
skips the skipping-forward step and simply collects the specified
number of characters. If you specify a %Sinfmt, then infile does not
skip leading or trailing blanks. Nevertheless, the general logic is
(optionally) skip, collect, and reset.
Examples: reading data with a dictionary
. infile using mydict
. infile using mydict, using(mydata)
. infile using mydict if b==1
. infile using mydict if runiform()<=.1
Examples: sample dictionaries
--------------------- top of xmpl1.dct ---
dictionary {
a
b
}
1 2
3 4
--------------------- end of xmpl1.dct ---
--------------------- top of xmpl2.dct ---
dictionary {
int t "day of year"
double amt "amount"
}
1 2.2
2 3.3
--------------------- end of xmpl2.dct ---
--------------------- top of xmpl3.dct ---
dictionary {
_lines(2)
_line(1)
int a
float b
_line(2)
float c
}
1 2.2
3.2
2 3.2
4.2
--------------------- end of xmpl3.dct ---
------------------------------- top of xmpl4.dct ---
dictionary {
long idnumb "Identification number"
str6 sex "Sex"
byte age "Age"
}
472921002 male 32
329193100 male 45
399938271 female 30
484873982 "female" 33
------------------------------- end of xmpl4.dct ---
------------------------------------------- top of xmpl5.dct ---
dictionary {
_column(5)
long idnumb %9f "Identification number"
str6 sex %6s "Sex"
int age %2f "Age"
_column(27)
float income %6f "Income"
}
329193402male 32 42000
472921002male 32 50000
329193100male 45
399938271female30 43000
484873982female33 48000
------------------------------------------- end of xmpl5.dct ---
Example: dictionary and data in separate files
------------------------------------------- top of persons.dct ---
dictionary using persons.raw {
_column(5)
long idnumb %9f "Identification number"
str6 sex %6s "Sex"
int age %2f "Age"
_column(27)
float income %6f "Income"
}
------------------------------------------- end of persons.dct ---
---------------- top of persons.raw ---
329193402male 32 42000
472921002male 32 50000
329193100male 45
399938271female30 43000
484873982female33 48000
---------------- end of persons.raw ---
Also see
Manual: [D] infile (fixed format)
Help: [D] infix, [D] outfile, [D] outsheet, [D] save