[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
David Kantor <dkantor@jhu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: data formatted on cards |

Date |
Tue, 01 Jul 2003 12:39:25 -0400 |

At 11:17 AM 7/1/2003 -0400, Patricia Pugliani wrote:

Knowing that it is fixed columns is one important step. The other information you will need is what those columns are. For each variable that you are interested in, you need to know its name, starting position, length, and type -- plus which "line" it is on. The latter remark related to the fact that there are 106 records ("lines") per case. That is, the data are in groups of 106 lines; some variables are on line 1 within each group, some are on line 2, and so forth.I have complicated raw data that is formatted on "cards" and I cannot determine how to make a dataset from these data. This is the information that I have concerning these data: DATASET 3 PART NAME Merged 1976, 1981, and 1987 Data FILE STRUCTURE rectangular CASE COUNT 1,427 VARIABLE COUNT 4,080 LRECL 86 RECORDS PER CASE 106 Can someone help me with this problem. I am a new Stata user and have only used datasets up to this point, and not raw data. The data is in fixed columns. Thank you

You will need to create a dictionary to describe how the variables are laid out in the raw data file. That is, you will need to do an -infile- with a dictionary, which you should see in the manual -- or type

help infile2

(or if you prefer,

view help infile2

)

at the Stata command window.

Note that those variables on line 1 of every group of 106, should be preceded by

_line(1)

in your dictionary (but this is optional if they are at the beginning of the dictionary). Those that are on line 2 of every group of 106 should be preceded by

_line(2)

and so forth (these are not optional).

(Those _line directives need only be written once before each group of variables that are on the same line of the raw file and described together in the dictionary.)

It is best to put

_lines(106)

near the top of the dictionary.

(It is optional under certain circumstances, but best just to include it.)

I recommend to place

_column()

(fill in the starting column number) before every variable description.

The type of each variable should be specified in your dictionary. You will also need to create input formats; those are determined by a combination of type and length.

Good luck.

-- David

David Kantor

Institute for Policy Studies

Johns Hopkins University

dkantor@jhu.edu

410-516-5404

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: data formatted on cards***From:*Steven Dubnoff <sdubnoff@circlesys.com>

- Prev by Date:
**st: data formatted on cards** - Next by Date:
**st: Re: Multinomial logistic(2nd time)** - Previous by thread:
**st: data formatted on cards** - Next by thread:
**Re: st: data formatted on cards** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |