Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data formatted on cards


From   David Kantor <dkantor@jhu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: data formatted on cards
Date   Tue, 01 Jul 2003 12:39:25 -0400

At 11:17 AM 7/1/2003 -0400, Patricia Pugliani wrote:
I have complicated raw data that is formatted on "cards" and I cannot
determine how to make a dataset from these data.
This is the information that I have concerning these data:
      DATASET 3
      PART NAME Merged 1976, 1981, and 1987 Data
      FILE STRUCTURE rectangular
      CASE COUNT 1,427
      VARIABLE COUNT 4,080
      LRECL 86
      RECORDS PER CASE 106

Can someone help me with this problem. I am a new Stata user and have only
used datasets up to this point, and not raw data. The data is in fixed
columns.
Thank you
Knowing that it is fixed columns is one important step. The other information you will need is what those columns are. For each variable that you are interested in, you need to know its name, starting position, length, and type -- plus which "line" it is on. The latter remark related to the fact that there are 106 records ("lines") per case. That is, the data are in groups of 106 lines; some variables are on line 1 within each group, some are on line 2, and so forth.

You will need to create a dictionary to describe how the variables are laid out in the raw data file. That is, you will need to do an -infile- with a dictionary, which you should see in the manual -- or type

help infile2
(or if you prefer,
view help infile2
)
at the Stata command window.

Note that those variables on line 1 of every group of 106, should be preceded by
_line(1)
in your dictionary (but this is optional if they are at the beginning of the dictionary). Those that are on line 2 of every group of 106 should be preceded by
_line(2)
and so forth (these are not optional).

(Those _line directives need only be written once before each group of variables that are on the same line of the raw file and described together in the dictionary.)

It is best to put
_lines(106)
near the top of the dictionary.

(It is optional under certain circumstances, but best just to include it.)

I recommend to place
_column()
(fill in the starting column number) before every variable description.

The type of each variable should be specified in your dictionary. You will also need to create input formats; those are determined by a combination of type and length.

Good luck.
-- David

David Kantor
Institute for Policy Studies
Johns Hopkins University
dkantor@jhu.edu
410-516-5404

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index