Stata 11 help for Infix

help infix dialog: infix -------------------------------------------------------------------------------

Title

[D] infix (fixed format) -- Read ASCII (text) data in fixed format

Syntax

infix using dfilename [if] [in] [, using(filename2) clear]

infix specifications using filename [if] [in] [, clear]

where dfilename, if it exists, contains

--------------------- top of dictionary file --- infix dictionary [using filename] { * comments preceded by * asterisk may appear freely specifications } (your data might appear here) --------------------- end of dictionary file ---

and where specifications is

# firstlineoffile # lines #: / [byte|int|float|long|double|str] varlist [#:]#[-#]

Menu

File > Import > ASCII data in fixed format

Description

infix reads into memory from a disk dataset that is not in Stata format. infix requires that the data be in fixed-column format.

If dfilename is specified without an extension, .dct is assumed. If filename is specified without an extension, .raw is assumed. If dfilename contains embedded spaces, remember to enclose it in double quotes.

In the first syntax, if using filename2 is not specified on the command line and using filename is not specified in the dictionary, the data are assumed to begin on the line following the closing brace.

infile and insheet are alternatives to infix. infile can also read data in fixed format -- see infile2 -- and it can read data in free format -- see infile1. Most people think that infix is easier to use for reading fixed-format data, but infile has more features. If your data are not fixed format, you can use insheet; see [D] insheet. If you are not certain that infix will do what you are looking for, see infiling and [U] 21 Inputting data.

In its first syntax, infix reads the data in a two-step process. You first create a disk file describing how the data are recorded. You tell infix to read that file -- called a dictionary -- and from there, infix reads the data. The data can be in the same file as the dictionary or in a different file.

In its second syntax, you tell infix how to read the data right on the command line with no intermediate file.

Options

+------+ ----+ Main +-------------------------------------------------------------

using(filename2) specifies the name of a file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename, or if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename2) is specified, filename2 is used to obtain the data, even if the dictionary says otherwise. If filename2 is specified without an extension, .raw is assumed. If filename2 contains embedded spaces, remember to enclose it in double quotes.

clear specifies that it is okay for the new data to replace what is currently in memory. To ensure that you do not lose something important, infix will refuse to read new data if data are already in memory. clear allows infix to replace the data in memory. You can also drop the data yourself by typing drop _all before reading new data.

Specifications

# firstlineoffile (abbreviation first) is rarely specified. It states the line of the file at which the data begin. You need not specify first when the data follow the dictionary; infix can figure that out for itself. You can specify first when only the data appear in a file and the first few lines of that file contain headers or other markers.

first appears only once in the specifications.

# lines states the number of lines per observation in the file. Simple datasets typically have "1 lines". Large datasets often have many lines (sometimes called records) per observation. lines is optional, even when there is more than one line per observation, because infix can sometimes figure it out for itself. Still, if 1 lines is not right for your data, it is best to specify the appropriate number of lines.

lines appears only once in the specifications.

#: tells infix to jump to line # of the observation. Consider a file with 4 lines, meaning four lines per observation. 2: says to jump to the second line of the observation. 4: says to jump to the fourth line of the observation. You may jump forward or backward: infix does not care, and there is no inefficiency in going forward to 3:, reading a few variables, jumping back to 1:, reading another variable, and jumping back again to 3:.

You need not ensure that, at the end of your specification, you are on the last line of the observation. infix knows how to get to the next observation because it knows where you are and it knows lines, the total number of lines per observation.

#: may appear many times in the specifications.

/ is an alternative to #:. / goes forward one line. // goes forward two lines. We do not recommend using / because #: is better. If you are currently on line 2 of an observation and want to get to line 6, you could type ////, but your meaning is clearer if you type 6:.

/ may appear many times in the specifications.

[ byte | int | float | long | double | str ] varlist [#:]#[-#] instructs infix to read a variable or, sometimes, more than one.

The simplest form of this is varname #, such as sex 20. That says that variable varname be read from column # of the current line; that variable sex be read from column 20; and that here, sex is a one-digit number.

varname #-#, such as age 21-23, says that varname be read from the column range specified; that age be read from columns 21 through 23; and that here, age is a three-digit number.

You can prefix the variable with a storage type. str name 25-44 means to read the string variable name from columns 25 through 44. If you do not specify str, the variable is assumed to be numeric. You can specify the numeric subtype if you wish.

You can specify more than one variable, with or without a type. byte q1-q5 51-55 means read variables q1, q2, q3, q4, and q5 from columns 51 through 55 and store the five variables as bytes.

Finally, you can specify the line on which the variable(s) appear. age 2:21-23 says that age is to be obtained from the second line, columns 21 through 23. Another way to do this is to put together the #: directive with the input-variable directive: : age 21-23. There is a difference, but not with respect to reading the variable age. Let's consider two alternatives:

1: str name 25-44 age 2:21-23 q1-q5 51-55

1: str name 25-44 2: age 21-23 q1-q5 51-55

The difference is that the first directive says that variables q1 through q5 are on line 1, whereas the second says that they are on line 2.

When the colon is put in front, it indicates the line on which variables are to be found when we do not explicitly say otherwise. When the colon is put inside, it applies only to the variable under consideration.

Examples of first syntax

. infix rate 1-4 speed 6-7 acc 9-11 using highway.raw . infix rate 1-4 speed 6-7 acc 9-11 using highway.raw if rate>2 . infix rate 1-4 speed 6-7 acc 9-11 using highway.raw in 1/100

Examples of second syntax

. infix using highway.dct . infix using highway.dct if rate>2 . infix using highway.dct in 1/100

where highway.dct contains

------------- top of highway.dct --- infix dictionary using highway.raw { rate 1-4 speed 6-7 acc 9-11 } ------------- end of highway.dct ---

Example reading string variables and multiple lines

. infix 2 lines 1: id 1-6 str name 7-36 2: age 1-2 sex 4 using emp.raw or . infix using emp.dct

where emp.dct contains

------------- top of emp.dct --- infix dictionary using emp.raw { 2 lines 1: id 1-6 str name 7-36 2: age 1-2 sex 4 } ------------- end of emp.dct ---

Also see

Manual: [D] infix (fixed format)

Help: [D] infile(fixed format), [D] insheet, [D] outfile, [D] outsheet, [D] save


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index