help infix dialog: infix
-------------------------------------------------------------------------------
Title
[D] infix (fixed format) -- Read ASCII (text) data in fixed format
Syntax
infix using dfilename [if] [in] [, using(filename2) clear]
infix specifications using filename [if] [in] [, clear]
where dfilename, if it exists, contains
--------------------- top of dictionary file ---
infix dictionary [using filename] {
* comments preceded by
* asterisk may appear freely
specifications
}
(your data might appear here)
--------------------- end of dictionary file ---
and where specifications is
# firstlineoffile
# lines
#:
/
[byte|int|float|long|double|str] varlist [#:]#[-#]
Menu
File > Import > ASCII data in fixed format
Description
infix reads into memory from a disk dataset that is not in Stata format.
infix requires that the data be in fixed-column format.
If dfilename is specified without an extension, .dct is assumed. If
filename is specified without an extension, .raw is assumed. If
dfilename contains embedded spaces, remember to enclose it in double
quotes.
In the first syntax, if using filename2 is not specified on the command
line and using filename is not specified in the dictionary, the data are
assumed to begin on the line following the closing brace.
infile and insheet are alternatives to infix. infile can also read data
in fixed format -- see infile2 -- and it can read data in free format --
see infile1. Most people think that infix is easier to use for reading
fixed-format data, but infile has more features. If your data are not
fixed format, you can use insheet; see [D] insheet. If you are not
certain that infix will do what you are looking for, see infiling and [U]
21 Inputting data.
In its first syntax, infix reads the data in a two-step process. You
first create a disk file describing how the data are recorded. You tell
infix to read that file -- called a dictionary -- and from there, infix
reads the data. The data can be in the same file as the dictionary or in
a different file.
In its second syntax, you tell infix how to read the data right on the
command line with no intermediate file.
Options
+------+
----+ Main +-------------------------------------------------------------
using(filename2) specifies the name of a file containing the data. If
using() is not specified, the data are assumed to follow the
dictionary in dfilename, or if the dictionary specifies the name of
some other file, that file is assumed to contain the data. If
using(filename2) is specified, filename2 is used to obtain the data,
even if the dictionary says otherwise. If filename2 is specified
without an extension, .raw is assumed. If filename2 contains
embedded spaces, remember to enclose it in double quotes.
clear specifies that it is okay for the new data to replace what is
currently in memory. To ensure that you do not lose something
important, infix will refuse to read new data if data are already in
memory. clear allows infix to replace the data in memory. You can
also drop the data yourself by typing drop _all before reading new
data.
Specifications
# firstlineoffile (abbreviation first) is rarely specified. It states
the line of the file at which the data begin. You need not specify
first when the data follow the dictionary; infix can figure that out
for itself. You can specify first when only the data appear in a
file and the first few lines of that file contain headers or other
markers.
first appears only once in the specifications.
# lines states the number of lines per observation in the file. Simple
datasets typically have "1 lines". Large datasets often have many
lines (sometimes called records) per observation. lines is optional,
even when there is more than one line per observation, because infix
can sometimes figure it out for itself. Still, if 1 lines is not
right for your data, it is best to specify the appropriate number of
lines.
lines appears only once in the specifications.
#: tells infix to jump to line # of the observation. Consider a file
with 4 lines, meaning four lines per observation. 2: says to jump to
the second line of the observation. 4: says to jump to the fourth
line of the observation. You may jump forward or backward: infix
does not care, and there is no inefficiency in going forward to 3:,
reading a few variables, jumping back to 1:, reading another
variable, and jumping back again to 3:.
You need not ensure that, at the end of your specification, you are
on the last line of the observation. infix knows how to get to the
next observation because it knows where you are and it knows lines,
the total number of lines per observation.
#: may appear many times in the specifications.
/ is an alternative to #:. / goes forward one line. // goes forward two
lines. We do not recommend using / because #: is better. If you are
currently on line 2 of an observation and want to get to line 6, you
could type ////, but your meaning is clearer if you type 6:.
/ may appear many times in the specifications.
[ byte | int | float | long | double | str ] varlist [#:]#[-#] instructs
infix to read a variable or, sometimes, more than one.
The simplest form of this is varname #, such as sex 20. That says
that variable varname be read from column # of the current line; that
variable sex be read from column 20; and that here, sex is a
one-digit number.
varname #-#, such as age 21-23, says that varname be read from the
column range specified; that age be read from columns 21 through 23;
and that here, age is a three-digit number.
You can prefix the variable with a storage type. str name 25-44
means to read the string variable name from columns 25 through 44.
If you do not specify str, the variable is assumed to be numeric.
You can specify the numeric subtype if you wish.
You can specify more than one variable, with or without a type. byte
q1-q5 51-55 means read variables q1, q2, q3, q4, and q5 from columns
51 through 55 and store the five variables as bytes.
Finally, you can specify the line on which the variable(s) appear.
age 2:21-23 says that age is to be obtained from the second line,
columns 21 through 23. Another way to do this is to put together the
#: directive with the input-variable directive: : age 21-23. There
is a difference, but not with respect to reading the variable age.
Let's consider two alternatives:
1: str name 25-44 age 2:21-23 q1-q5 51-55
1: str name 25-44 2: age 21-23 q1-q5 51-55
The difference is that the first directive says that variables q1
through q5 are on line 1, whereas the second says that they are on
line 2.
When the colon is put in front, it indicates the line on which
variables are to be found when we do not explicitly say otherwise.
When the colon is put inside, it applies only to the variable under
consideration.
Examples of first syntax
. infix rate 1-4 speed 6-7 acc 9-11 using highway.raw
. infix rate 1-4 speed 6-7 acc 9-11 using highway.raw if rate>2
. infix rate 1-4 speed 6-7 acc 9-11 using highway.raw in 1/100
Examples of second syntax
. infix using highway.dct
. infix using highway.dct if rate>2
. infix using highway.dct in 1/100
where highway.dct contains
------------- top of highway.dct ---
infix dictionary using highway.raw {
rate 1-4
speed 6-7
acc 9-11
}
------------- end of highway.dct ---
Example reading string variables and multiple lines
. infix 2 lines 1: id 1-6 str name 7-36 2: age 1-2 sex 4 using
emp.raw
or
. infix using emp.dct
where emp.dct contains
------------- top of emp.dct ---
infix dictionary using emp.raw {
2 lines
1:
id 1-6
str name 7-36
2:
age 1-2
sex 4
}
------------- end of emp.dct ---
Also see
Manual: [D] infix (fixed format)
Help: [D] infile(fixed format), [D] insheet, [D] outfile, [D]
outsheet, [D] save