Stata 15 help for infix

[D] infix (fixed format) -- Import text data in fixed format


infix using dfilename [if] [in] [, using(filename2) clear]

infix specifications using filename [if] [in] [, clear]

If dfilename is specified without an extension, .dct is assumed. If dfilename contains embedded spaces, remember to enclose it in double quotes. dfilename, if it exists, contains

--------------------- begin dictionary file --- infix dictionary [using filename] { * comments preceded by * asterisk may appear freely specifications } (your data might appear here) ----------------------- end dictionary file ---

If filename is specified without an extension, .raw is assumed. If filename contains embedded spaces, remember to enclose it in double quotes.

specifications is

# firstlineoffile # lines #: / [byte|int|float|long|double|str] varlist [#:]#[-#]


File > Import > Text data in fixed format with a dictionary


infix reads into memory from a disk dataset that is not in Stata format. infix requires that the data be in fixed-column format. Note that the column is byte based. The number of columns means the number of bytes in the file. The text file filename is treated as a stream of bytes, no encoding is assumed. If string data are encoded as ASCII or UTF-8, they will be imported correctly.

In the first syntax, if using filename2 is not specified on the command line and using filename is not specified in the dictionary, the data are assumed to begin on the line following the closing brace. infix reads the data in a two-step process. You first create a disk file describing how the data are recorded. You tell infix to read that file -- called a dictionary -- and from there, infix reads the data. The data can be in the same file as the dictionary or in a different file.

In its second syntax, you tell infix how to read the data right on the command line with no intermediate file.

infile and import delimited are alternatives to infix. infile can also read data in fixed format -- see infile2 -- and it can read data in free format -- see infile1. Most people think that infix is easier to use for reading fixed-format data, but infile has more features. If your data are not fixed format, you can use import delimited; see [D] import delimited. import delimited allows you to specify the source file's encoding and then performs a conversion to UTF-8 encoding during import. If you are not certain that infix will do what you are looking for, see [D] import and [U] 21 Entering and importing data.


+------+ ----+ Main +-------------------------------------------------------------

using(filename2) specifies the name of a file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename, or if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename2) is specified, filename2 is used to obtain the data, even if the dictionary says otherwise. If filename2 is specified without an extension, .raw is assumed. If filename2 contains embedded spaces, remember to enclose it in double quotes.

clear specifies that it is okay for the new data to replace what is currently in memory. To ensure that you do not lose something important, infix will refuse to read new data if data are already in memory. clear allows infix to replace the data in memory. You can also drop the data yourself by typing drop _all before reading new data.


# firstlineoffile (abbreviation first) is rarely specified. It states the line of the file at which the data begin. You need not specify first when the data follow the dictionary; infix can figure that out for itself. You can specify first when only the data appear in a file and the first few lines of that file contain headers or other markers.

first appears only once in the specifications.

# lines states the number of lines per observation in the file. Simple datasets typically have "1 lines". Large datasets often have many lines (sometimes called records) per observation. lines is optional, even when there is more than one line per observation, because infix can sometimes figure it out for itself. Still, if 1 lines is not right for your data, it is best to specify the appropriate number of lines.

lines appears only once in the specifications.

#: tells infix to jump to line # of the observation. Consider a file with 4 lines, meaning four lines per observation. 2: says to jump to the second line of the observation. 4: says to jump to the fourth line of the observation. You may jump forward or backward: infix does not care, and there is no inefficiency in going forward to 3:, reading a few variables, jumping back to 1:, reading another variable, and jumping back again to 3:.

You need not ensure that, at the end of your specification, you are on the last line of the observation. infix knows how to get to the next observation because it knows where you are and it knows lines, the total number of lines per observation.

#: may appear many times in the specifications.

/ is an alternative to #:. / goes forward one line. // goes forward two lines. We do not recommend using / because #: is better. If you are currently on line 2 of an observation and want to get to line 6, you could type ////, but your meaning is clearer if you type 6:.

/ may appear many times in the specifications.

[ byte | int | float | long | double | str ] varlist [#:]#[-#] instructs infix to read a variable or, sometimes, more than one.

The simplest form of this is varname #, such as sex 20. That says that variable varname be read from column # of the current line; that variable sex be read from column 20; and that here, sex is a one-digit number.

varname #-#, such as age 21-23, says that varname be read from the column range specified; that age be read from columns 21 through 23; and that here, age is a three-digit number.

You can prefix the variable with a storage type. str name 25-44 means to read the string variable name from columns 25 through 44. Note that the string variable name consists of 44-25+1 = 20 bytes. If you do not specify str, the variable is assumed to be numeric. You can specify the numeric subtype if you wish. If you specify str, infix will automatically assign the appropriate string variable type, str# or strL. Imported strings may be up to 100,000 bytes.

You can specify more than one variable, with or without a type. byte q1-q5 51-55 means read variables q1, q2, q3, q4, and q5 from columns 51 through 55 and store the five variables as bytes.

Finally, you can specify the line on which the variable(s) appear. age 2:21-23 says that age is to be obtained from the second line, columns 21 through 23. Another way to do this is to put together the #: directive with the input-variable directive: : age 21-23. There is a difference, but not with respect to reading the variable age. Let's consider two alternatives:

1: str name 25-44 age 2:21-23 q1-q5 51-55

1: str name 25-44 2: age 21-23 q1-q5 51-55

The difference is that the first directive says that variables q1 through q5 are on line 1, whereas the second says that they are on line 2.

When the colon is put in front, it indicates the line on which variables are to be found when we do not explicitly say otherwise. When the colon is put inside, it applies only to the variable under consideration.

Examples of first syntax

. infix rate 1-4 speed 6-7 acc 9-11 using highway.raw . infix rate 1-4 speed 6-7 acc 9-11 using highway.raw if rate>2 . infix rate 1-4 speed 6-7 acc 9-11 using highway.raw in 1/100

Examples of second syntax

. infix using highway.dct . infix using highway.dct if rate>2 . infix using highway.dct in 1/100

where highway.dct contains

------------- begin highway.dct ---- infix dictionary using highway.raw { rate 1-4 speed 6-7 acc 9-11 } ------------- end highway.dct ------

Example reading string variables and multiple lines

. infix 2 lines 1: id 1-6 str name 7-36 2: age 1-2 sex 4 using emp.raw or . infix using emp.dct

where emp.dct contains

------------- begin emp.dct ---- infix dictionary using emp.raw { 2 lines 1: id 1-6 str name 7-36 2: age 1-2 sex 4 } ------------- end emp.dct ------

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index