Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Stata's character encoding

From   Billy Schwartz <[email protected]>
To   [email protected]
Subject   st: Stata's character encoding
Date   Mon, 23 Jul 2012 12:03:00 -0400

I'm trying to generate automatically some Stata scripts from an
external program* that by default encodes all text files at UTF-8.
Best I can tell, Stata uses whatever character encoding is native to
the platform it's on (e.g., Windows-1252 on Windows) which means the
only portable character encoding is plain ASCII (no characters with
code points above 127) for reading Do-files and spreadsheet data and
is flexible about whether line endings are LF or CRLF (but must be
consistent within a given file -- I've had problems loading
spreadsheet data that were CRLF line-terminated but randomly
distributed CRs throughout the data made Stata think there were line
endings where there weren't.)

Is this a correct characterization of the way Stata reads text files?
If not, what's the most portable way for me to encode text for both Do
files and spreadsheet data?

*I'm writing Python scripts to write Stata scripts because I expect my
input data to change several times and I don't want to have to hand
rewrite Stata code each time the underlying data changes. I find
examining directory structures and reading non-tabular data (in this
case, the record layouts for the data I'm working with) easier to
express in Python than in Stata. I'm open to suggestions on best ways
to deal with this, but since I've got it mostly written, that's not
the main goal of this email.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index