Stata 15 help for infile2

[D] infile (fixed format) -- Import text data in fixed format with a dictionary

Syntax

infile using dfilename [if] [in] [, options]

If dfilename is specified without an extension, .dct is assumed. If dfilename contains embedded spaces, remember to enclose it in double quotes.

options Description ------------------------------------------------------------------------- Main using(filename) text dataset filename clear replace data in memory

Options automatic create value labels from nonnumeric data ebcdic treat text dataset as EBCDIC -------------------------------------------------------------------------

A dictionary is a text file that is created with the Do-file Editor or an editor outside Stata. This file specifies how Stata should read fixed-format data from a text file. The syntax for a dictionary is

-------------------------------------- begin dictionary file ---- [infile] dictionary [using filename] { * comments may be included freely _lrecl(#) _firstlineoffile(#) _lines(#)

_line(#) _newline[(#)]

_column(#) _skip[(#)]

[type] varname [:lblname] [%infmt] ["variable label"] } (your data might appear here) -------------------------------------- end dictionary file ------

where %infmt is { %[#[.#]] {f|g|e} | %[#]s | %[#]S}

Menu

File > Import > Text data in fixed format with a dictionary

Description

infile using reads a dataset that is stored in text form. infile using does this by first reading dfilename -- a "dictionary" that describes the format of the data file -- and then reads the file containing the data. The dictionary is a file you create with the Do-file Editor or an editor outside Stata.

Strings containing plain ASCII or UTF-8 are imported correctly. Strings containing extended ASCII will not be imported (that is, displayed) correctly; you can use Stata's replace command with the ustrfrom() function to convert extended ASCII to UTF-8. If ebcdic is specified, the data will be converted from EBCDIC to ASCII as they are imported. The dictionary in all cases must be ASCII.

If using filename is not specified, the data are assumed to begin on the line following the closing brace. If using filename is specified, the data are assumed to be located in filename.

The data may be in the same file as the dictionary or in another file. infile with a dictionary can import both numeric and string data. Individual strings may be up to 100,000 bytes long. Strings longer than 2,045 bytes are imported as strLs (see [U] 12.4.8 strL).

Another variation on infile omits the intermediate dictionary; see infile1. This variation is easier to use but will not read fixed-format files. On the other hand, although infile with a dictionary will read free-format files, infile without a dictionary is even better at it.

An alternative to infile using for reading fixed-format files is infix; see [D] infix (fixed format). infix provides fewer features than infile using but is easier to use.

Stata has other commands for reading data. If you are not certain that infile using will do what you are looking for, see [D] import and [U] 21 Entering and importing data.

Options

+------+ ----+ Main +-------------------------------------------------------------

using(filename) specifies the name of a file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename, or if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename) is specified, filename is used to obtain the data, even if the dictionary says otherwise. If filename is specified without an extension, .raw is assumed.

If filename contains embedded spaces, remember to enclose it in double quotes.

clear specifies that it is okay for the new data to replace what is currently in memory. To ensure that you do not lose something important, infile using will refuse to read new data if other data are already in memory. clear allows infile using to replace the data in memory. You can also drop the data yourself by typing drop _all before reading new data.

+---------+ ----+ Options +----------------------------------------------------------

automatic causes Stata to create value labels from the nonnumeric data it reads. It also automatically widens the display format to fit the longest label.

ebcdic specifies that the data be stored using EBCDIC character encoding rather than the default ASCII encoding and that the data be converted from EBCDIC to ASCII as they are imported.

Dictionary directives

* marks comment lines. Wherever you wish to place a comment, begin the line with a *. Comments can appear many times in the same dictionary.

_lrecl(#) is used only for reading datasets that do not have end-of-line delimiters (carriage return, line feed, or some combination of these). Such files are often produced by mainframe computers and are either coded in EBCDIC or have been translated from EBCDIC into ASCII. _lrecl() specifies the logical record length. _lrecl() requests that infile act as if a line ends every # bytes.

_lrecl() appears only once, and typically not at all, in a dictionary.

_firstlineoffile(#) (abbreviation _first()) is also rarely specified. It states the line of the file where the data begin. You do not need to specify _first() when the data follow the dictionary; Stata can figure that out for itself. However, you might specify _first() when reading data from another file in which the first line does not contain data because of headers or other markers.

_first() appears only once, and typically not at all, in a dictionary.

_lines(#) states the number of lines per observation in the file. Simple datasets typically have _lines(1). Large datasets often have many lines (sometimes called records) per observation. _lines() is optional, even when there is more than one line per observation because infile can sometimes figure it out for itself. Still, if _lines(1) is not right for your data, it is best to specify the correct number through _lines(#).

_lines() appears only once in a dictionary.

_line(#) tells infile to jump to line # of the observation. _line() is not the same as _lines(). Consider a file with _lines(4), meaning four lines per observation. _line(2) says to jump to the second line of the observation. _line(4) says to jump to the fourth line of the observation. You may jump forward or backward. infile does not care, and there is no inefficiency in going forward to _line(3), reading a few variables, jumping back to _line(1), reading another variable, and jumping forward again to _line(3).

You need not ensure that, at the end of your dictionary, you are on the last line of the observation. infile knows how to get to the next observation because it knows where you are and it knows _lines(), the total number of lines per observation.

_line() may appear many times in a dictionary.

_newline[(#)] is an alternative to _line(). _newline(1), which may be abbreviated _newline, goes forward one line. _newline(2) goes forward two lines. We do not recommend using _newline() because _line() is better. If you are currently on line 2 of an observation and want to get to line 6, you could type _newline(4), but your meaning is clearer if you type _line(6).

_newline() may appear many times in a dictionary.

_column(#) jumps to column # (in bytes) of the current line. You may jump forward or backward within a line. _column() may appear many times in a dictionary.

_skip[(#)] jumps forward # columns on the current line. _skip() is just an alternative to _column(). _skip() may appear many times in a dictionary.

[type] varname [:lblname}] [%infmt] ["variable label"] instructs infile to read a variable. The simplest form of this instruction is the variable name itself: varname.

At all times, infile is on some column of some line of an observation. infile starts on column 1 of line 1, so pretend that is where we are. Given the simplest directive, `varname', infile goes through the following logic:

If the current column is blank, it skips forward until there is a nonblank column (or until the end of the line). If it just skipped all the way to the end of the line, it stores a missing value in varname. If it skipped to a nonblank column, it begins collecting what is there until it comes to a blank column or the end of the line. These are the data for varname. Then it sets the current column to wherever it is.

The logic is a bit more complicated. For instance, when skipping forward to find the data, infile might encounter a quote. If so, it then collects the characters for the data by skipping forward until it finds the matching quote. If you specify a %infmt, then infile skips the skipping-forward step and simply collects the specified number of bytes. If you specify a %Sinfmt, then infile does not skip leading or trailing blanks. Nevertheless, the general logic is (optionally) skip, collect, and reset.

Examples: reading data with a dictionary

. infile using mydict . infile using mydict, using(mydata) . infile using mydict if b==1 . infile using mydict if runiform()<=.1

Example: reading EBCDIC data with a dictionary

. infile using mydict, using(myebcdicdata) ebcdic

Examples: sample dictionaries

--------------------- begin xmpl1.dct ---- dictionary { a b } 1 2 3 4 --------------------- end xmpl1.dct ------

--------------------- begin xmpl2.dct ---- dictionary { int t "day of year" double amt "amount" } 1 2.2 2 3.3 --------------------- end xmpl2.dct ------

--------------------- begin xmpl3.dct ---- dictionary { _lines(2) _line(1) int a float b _line(2) float c } 1 2.2 3.2 2 3.2 4.2 --------------------- end xmpl3.dct ------

------------------------------- begin xmpl4.dct ---- dictionary { long idnumb "Identification number" str6 sex "Sex" byte age "Age" } 472921002 male 32 329193100 male 45 399938271 female 30 484873982 "female" 33 ------------------------------- end xmpl4.dct ------

------------------------------------------- begin xmpl5.dct ---- dictionary { _column(5) long idnumb %9f "Identification number" str6 sex %6s "Sex" int age %2f "Age" _column(27) float income %6f "Income" } 329193402male 32 42000 472921002male 32 50000 329193100male 45 399938271female30 43000 484873982female33 48000 ------------------------------------------- end xmpl5.dct ------

Example: dictionary and data in separate files

------------------------------------------- begin persons.dct ---- dictionary using persons.raw { _column(5) long idnumb %9f "Identification number" str6 sex %6s "Sex" int age %2f "Age" _column(27) float income %6f "Income" } ------------------------------------------- end persons.dct ------

---------------- begin persons.raw ---- 329193402male 32 42000 472921002male 32 50000 329193100male 45 399938271female30 43000 484873982female33 48000 ---------------- end persons.raw ------


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index