Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Problem with Dictionary File


From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: Problem with Dictionary File
Date   Wed, 06 Jun 2007 15:00:01 -0400

At 02:10 PM 6/6/2007, Krishanu wrote:
Dear Stata User Friends,

I have recently encountered a peculiar problem while trying to extract
fixed format data using a dictionary file. Any focus on the problem
will be welcome. The problem is as follows (I am putting things point
wise):
1. As stated above I am trying to extract some fixed format data from
a .txt file.
2. There were 16 "levels" (or 16 different "sets of questions"). So I
have to use 16 different dictionary files to extract the data relevant
for each level and drop all observations execpt for the level for
which the current dictionary file has been written (using keep if
level=="01" after using the dictionary file for level 1 and so on).
this gives me 16 .dta datasets each partaining to one "level".
3. now each data set has some number of variables which generally
start with a few identifiers and then some
actual data and then some more identifiers.
4. Now comes the problem. while writing the dictionary file I am
specifying the storage type of the identifiers as str#. then some of
the other variables have str# or numeric (byte, float etc) storage
format. ALong with this I am also
Specifying the %infmt as %#s or %#f accordingly. Where # is the
relevant integer which may be 1, 2, or any other integer. But the
PROBLEM is STATA is reading all the variables with a # in the %infmt
greater than or equal to 9. i.e for all the specifications (by me) %1
or 2 or ....8s or f Stata is reading them as %9s or f and sometimes
even as %9g. But for any # in %#s or f that I specified to be greater
than 9 Stata is reading it as I have written. Why is Stata defying the
instructions for #<9?

Any Help will be welcome

Thanking you
Krishanu
Are you confusing (display) formats with infmts?

In a dictionary, you give the variable an infmt to tell -infile- something about the field -- the segment of the raw data line where the value is to be taken from. If the infmt includes an integer, then that is the width of the field.

Once the data are read into Stata, the variables are given display formats -- which look just like infmts, but have a very different purpose. These tell Stata how to display the values when listing or doing other actions that write the values to the screen or log. The display formats are what you see when you do -describe-. Typically, the widths are 9, 10, or 12 for numeric types, but you can change them, though there may be no real need to do so. Don't worry if they are different from the infmts; if you specified an infmt of %2s, you read a 2-character field.

You may want to check some sample values against the raw data to verify that the correct values were read -- that the correct field widths were used.

HTH
--David

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index