Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: binary format type str question


From   Mark Fisher <mark@markfisher.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: binary format type str question
Date   Tue, 13 Mar 2007 05:48:05 -0400

Thanks David. That's what I think I'm doing and it works for data_label and time_stamp, but it doesn't seem to work for the str types.

Here's an example. There are 6 variables with types 98, 136, 102, 105, 102, and 98. I read that as 6 str types with maximum lengths 98 bytes, 136 bytes, etc. There are 51 observations. But the remaining number of bytes is 1071. This means there are 3.5 bytes per datum. There aren't enough bytes to go around if I assume fixed lengths! One the other hand, if I try to start another variable as soon as I hit a zero, I find there are multiple zeros in a row, which would seem to indicate no data for some variables. Hmm. Clearly I'm missing something.

--Mark.

David Kantor wrote:

At 08:35 PM 3/12/2007, Mark Fisher wrote:
Hi. I'm writing a Mathematica program to read stata "dta" files. I have the "Stata help for dta" page, which is quite useful. Everything seems to work fine as long as the data types are in the range 251 to 255 (byte, int, long, float, or double). But I can't figure out how to properly read the data when the data types are in the range 1 to 244 (str1, str2, ... str244). BTW, I have no trouble reading the "char" strings for the data_label and the time_stamp; I just read them in as a list of bytes, discard the bytes starting with the first zero, and convert the remaining bytes to ascii. But the str types don't seem be the same sort of beast. Any guidance would be appreciated. Thanks.

--Mark.
I'm not looking at the documentation for this, and I've never done any work like that, but I do recall reading that the string types are stored such that...
they have a 0-byte terminator if they are shorter than the maximal length of the type;
they have no terminator otherwise -- if they fill up the maximal length.

Thus, you need the type's nominal (maximal) length as a factor in reading the values.

For example, if the type is str20, then the values have a 0-byte terminator if they are shorter than 20, and no terminator if they are 20 characters long.

I hope this is correct and that it helps. Good luck.
--David

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index