Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Billy Schwartz <wkschwartz@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Stata's character encoding |
Date | Mon, 23 Jul 2012 12:03:00 -0400 |
I'm trying to generate automatically some Stata scripts from an external program* that by default encodes all text files at UTF-8. Best I can tell, Stata uses whatever character encoding is native to the platform it's on (e.g., Windows-1252 on Windows) which means the only portable character encoding is plain ASCII (no characters with code points above 127) for reading Do-files and spreadsheet data and is flexible about whether line endings are LF or CRLF (but must be consistent within a given file -- I've had problems loading spreadsheet data that were CRLF line-terminated but randomly distributed CRs throughout the data made Stata think there were line endings where there weren't.) Is this a correct characterization of the way Stata reads text files? If not, what's the most portable way for me to encode text for both Do files and spreadsheet data? ----------- *I'm writing Python scripts to write Stata scripts because I expect my input data to change several times and I don't want to have to hand rewrite Stata code each time the underlying data changes. I find examining directory structures and reading non-tabular data (in this case, the record layouts for the data I'm working with) easier to express in Python than in Stata. I'm open to suggestions on best ways to deal with this, but since I've got it mostly written, that's not the main goal of this email. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/