Stata 11 help for infile2

help infile2 dialog: infile (fixed format) -------------------------------------------------------------------------------

Title

[D] infile (fixed format) -- Read ASCII (text) data in fixed format with a dictionary

Syntax

infile using dfilename [if] [in] [, options]

options description ------------------------------------------------------------------------- Main using(filename) ASCII dataset filename clear replace data in memory

Options automatic create value labels from nonnumeric data -------------------------------------------------------------------------

The syntax for a dictionary (a file created with an editor or word processor outside Stata) is

-------------------------------------- top of dictionary file --- [infile] dictionary [using filename] { * comments may be included freely _lrecl(#) _firstlineoffile(#) _lines(#)

_line(#) _newline[(#)]

_column(#) _skip[(#)]

[type] varname [:lblname] [%infmt] ["variable label"] } (your data might appear here) -------------------------------------- end of dictionary file ---

where %infmt is { %[#[.#]] {f|g|e} | %[#]s | %[#]S}

Menu

File > Import > ASCII data in fixed format with a dictionary

Description

infile using reads from a disk a dataset that is not in Stata format. infile using does this by first reading dfilename -- a "dictionary" that describes the format of the data file -- and then reads the file containing the data. The dictionary is a file you create in an editor or word processor outside Stata. If dfilename is specified without an extension, .dct is assumed.

If using filename is not specified, the data are assumed to begin on the line following the closing brace. If using filename is specified, the data are assumed to be located in filename. If filename is specified without an extension, .raw is assumed.

Note for Stata for Mac and Stata for Windows users: If dfilename or filename contains embedded spaces, remember to enclose it in double quotes.

The data may be in the same file as the dictionary or in another file.

Another variation on infile omits the intermediate dictionary; see infile1. This variation is easier to use but will not read fixed-format files. On the other hand, although infile using will read free-format files, infile without a dictionary is even better at it.

An alternative to infile using for reading fixed-format files is infix; see [D] infix (fixed format). infix provides fewer features than infile using but is easier to use.

Stata has other commands for reading data. If you are not certain that infile using will do what you are looking for, see infiling and [U] 21 Inputting data.

Options

+------+ ----+ Main +-------------------------------------------------------------

using(filename) specifies the name of a file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename, or if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename) is specified, filename is used to obtain the data, even if the dictionary says otherwise. If filename is specified without an extension, .raw is assumed.

Note for Stata for Mac and Stata for Windows users: If filename contains embedded spaces, remember to enclose it in double quotes.

clear specifies that it is okay for the new data to replace what is currently in memory. To ensure that you do not lose something important, infile using will refuse to read new data if other data are already in memory. clear allows infile using to replace the data in memory. You can also drop the data yourself by typing drop _all before reading new data.

+---------+ ----+ Options +----------------------------------------------------------

automatic causes Stata to create value labels from the nonnumeric data it reads. It also automatically widens the display format to fit the longest label.

Dictionary directives

* marks comment lines. Wherever you wish to place a comment, begin the line with a *. Comments can appear many times in the same dictionary.

_lrecl(#) is used only for reading datasets that do not have end-of-line delimiters (carriage return, line feed, or some combination of these). Such files are often produced by mainframe computers and have been poorly translated from EBCDIC into ASCII. _lrecl() specifies the logical record length. _lrecl() requests that infile act as if a line ends every # characters.

_lrecl() appears only once, and typically not at all, in a dictionary.

_firstlineoffile(#) (abbreviation _first()) is also rarely specified. It states the line of the file where the data begin. You do not need to specify _first() when the data follow the dictionary; Stata can figure that out for itself. However, you might specify _first() when reading data from another file in which the first line does not contain data because of headers or other markers.

_first() appears only once, and typically not at all, in a dictionary.

_lines(#) states the number of lines per observation in the file. Simple datasets typically have _lines(1). Large datasets often have many lines (sometimes called records) per observation. _lines() is optional, even when there is more than one line per observation because infile can sometimes figure it out for itself. Still, if _lines(1) is not right for your data, it is best to specify the correct number through _lines(#).

_lines() appears only once in a dictionary.

_line(#) tells infile to jump to line # of the observation. _lines() is not the same as _line(). Consider a file with _lines(4), meaning four lines per observation. _line(2) says to jump to the second line of the observation. _line(4) says to jump to the fourth line of the observation. You may jump forward or backward. infile does not care, and there is no inefficiency in going forward to _line(3), reading a few variables, jumping back to _line(1), reading another variable, and jumping forward again to _line(3).

You need not ensure that, at the end of your dictionary, you are on the last line of the observation. infile knows how to get to the next observation because it knows where you are and it knows _lines(), the total number of lines per observation.

_line() may appear many times in a dictionary.

_newline[(#)] is an alternative to _line(). _newline(1), which may be abbreviated _newline, goes forward one line. _newline(2) goes forward two lines. We do not recommend using _newline() because _line() is better. If you are currently on line 2 of an observation and want to get to line 6, you could type _newline(4), but your meaning is clearer if you type _line(6).

_newline() may appear many times in a dictionary.

_column(#) jumps to column # on the current line. You may jump forward or backward within a line. _column() may appear many times in a dictionary.

_skip[(#)] jumps forward # columns on the current line. _skip() is just an alternative to _column(). _skip() may appear many times in a dictionary.

[type] varname [:lblname}] [%infmt] ["variable label"] instructs infile to read a variable. The simplest form of this instruction is the variable name itself: varname.

At all times, infile is on some column of some line of an observation. infile starts on column 1 of line 1, so pretend that is where we are. Given the simplest directive, `varname', infile goes through the following logic:

If the current column is blank, it skips forward until there is a nonblank column (or until the end of the line). If it just skipped all the way to the end of the line, it stores a missing value in varname. If it skipped to a nonblank column, it begins collecting what is there until it comes to a blank column or the end of the line. These are the data for varname. Then it sets the current column to wherever it is.

The logic is a bit more complicated. For instance, when skipping forward to find the data, infile might encounter a quote. If so, it then collects the characters for the data by skipping forward until it finds the matching quote. If you specify a %infmt, then infile skips the skipping-forward step and simply collects the specified number of characters. If you specify a %Sinfmt, then infile does not skip leading or trailing blanks. Nevertheless, the general logic is (optionally) skip, collect, and reset.

Examples: reading data with a dictionary

. infile using mydict . infile using mydict, using(mydata) . infile using mydict if b==1 . infile using mydict if runiform()<=.1

Examples: sample dictionaries

--------------------- top of xmpl1.dct --- dictionary { a b } 1 2 3 4 --------------------- end of xmpl1.dct ---

--------------------- top of xmpl2.dct --- dictionary { int t "day of year" double amt "amount" } 1 2.2 2 3.3 --------------------- end of xmpl2.dct ---

--------------------- top of xmpl3.dct --- dictionary { _lines(2) _line(1) int a float b _line(2) float c } 1 2.2 3.2 2 3.2 4.2 --------------------- end of xmpl3.dct ---

------------------------------- top of xmpl4.dct --- dictionary { long idnumb "Identification number" str6 sex "Sex" byte age "Age" } 472921002 male 32 329193100 male 45 399938271 female 30 484873982 "female" 33 ------------------------------- end of xmpl4.dct ---

------------------------------------------- top of xmpl5.dct --- dictionary { _column(5) long idnumb %9f "Identification number" str6 sex %6s "Sex" int age %2f "Age" _column(27) float income %6f "Income" } 329193402male 32 42000 472921002male 32 50000 329193100male 45 399938271female30 43000 484873982female33 48000 ------------------------------------------- end of xmpl5.dct ---

Example: dictionary and data in separate files

------------------------------------------- top of persons.dct --- dictionary using persons.raw { _column(5) long idnumb %9f "Identification number" str6 sex %6s "Sex" int age %2f "Age" _column(27) float income %6f "Income" } ------------------------------------------- end of persons.dct ---

---------------- top of persons.raw --- 329193402male 32 42000 472921002male 32 50000 329193100male 45 399938271female30 43000 484873982female33 48000 ---------------- end of persons.raw ---

Also see

Manual: [D] infile (fixed format)

Help: [D] infix, [D] outfile, [D] outsheet, [D] save


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index