Stata | FAQ: Infile dictionary options

Home / Resources & support / FAQs / Infile dictionary options

infile dictionary options

Title		Infile dictionary options
Author		James Hardin, StataCorp
Date		January 1996

If you are reading a dataset with a dictionary, then Stata is reading that data in record mode. This means that Stata has the concept of a row of data coming in from the raw data file and somehow being split up into the variables. Using the dictionary, you have complete control over how that information is assigned to your variables in Stata. With this power, you need to learn how to use it to best accomplish your goals. In a dictionary, there is one line for each variable that you will be reading in. On each line there are the following directives

An optional column(#) directive stating where to begin reading. This tells Stata to move to the specified column in order to read in the data associated with this particular variable. By default, Stata will just move to the next column.
An optional skip(#) directive stating how far over to skip from the end of the last field that was processed. This tells Stata how many columns to skip before reading in the data associated with this particular variable. By default, Stata will just skip over one more column from the previous variable.
An optional data type for how to store the variable. Do not get this directive confused with the read format specifier. This directive only affects how the data is stored and not how it is read. Just because you specify that a variable is type str5 does not mean that Stata will read in 5 columns of data for this variable. In order to control how many columns are read for a particular variable, you must specify a read format. If you do not specify a type for the variable, it will default to float.
A required variable name. You must specify a variable name for each of the fields that you will read from the raw data records. There is no shortcut for this.
An optional label name to specify a value label for the variable. You can use this if you want to associate a value label name that you will apply to a numeric variable. (Value labels allow numeric variables to contain "strings", the strings being numerically encoded.) This is rarely used (the data is typically read into string variables) and may only be useful when you will be applying a great number of value labels to your variables, or when you are defining the value labels and the infile steps in a do-file.
An optional read format to read the field for this variables values. Many people assume that once the type is specified, this does not have to be also specified. On the contrary, the read format is even more important in many cases as this is what tells Stata how many columns should be read for a particular variable.
An optional variable label to apply to this variable. This is the descriptive label associated with a variable that is printed out to the right of the variable in the describe command. You do not have to specify this variable though for large datasets, it can be helpful.

There are times that you need to specify only a few of these, and there are other times that you may need to specify many of these directives.

The above tools allow you to control for each variable

how it is read
where it is read from
how it is stored
what the variable is called
what the variable is labeled
how the values are labeled

This is usually enough for almost all datasets that you encounter. However, there are other datasets that have additional complexities to how they are organized in the file. To address those additional complexities, you may specify these other directives to further control the overall behavior of Stata as it processes the data file

lrecl(#) will allow you to specify the logical record length for the file. In most files, there are line breaks from one record to the next. In other files, there are no line lengths, but each line is a certain number of characters long. In order to specify this length, you use this directive.
newline(#) will allow you to specify that the next field described begins on # lines down in the file. Some datasets are organized in such a way that each record extends across multiple lines in the file.
comments are allowed in the dictionary file and give you the opportunity to add notes to the dictionary files that you create for reading in your raw data files. This is the most overlooked optional directives available for the dictionary file. However, you should use it as it will allow you to return to old dictionaries and remind yourself of how you solved problematic reads.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

infile dictionary options

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

infile dictionary options

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies