Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Fwd: Import win1251-encoded data in Stata


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Fwd: Import win1251-encoded data in Stata
Date   Wed, 27 Nov 2013 21:42:31 -0500

Pavel asks how to work with Cyrillic texts in datasets in Stata. As
per statalist FAQ (section 6) the question should be posted to the
list.

Pavel should check if the dataset is saved in unicode (multiple bytes
per character) or in a code-page format (1 byte per character). If the
dataset is in unicode, Pavel might need to first save it into a
particular codepage. ANSI 1251 is commonly used and recommended. But
there are many Russian code pages. The data provider should be able to
tell which settings were used. If the dataset is true unicode with
multiple languages (e.g. Russian and Greek) there is no way to see
both language characters in Stata, and the user will have to decide
which one he wants to retain.
Once the dataset is confirmed to be in ANSI 1251, Stata will display
Cyrillic characters if the font 'script' is properly selected, see the
last slide here:
http://www.stata.com/meeting/uk13/abstracts/materials/uk13_radyakin.pdf

Users of platforms other than Windows might confirm whether this
facility exists in their versions of Stata. I don't have this
information. The choices of scripts depend on the installed fonts (as
not all fonts implement all characters). Installing additional fonts
might allow selecting the script not available by default. Google
might be helpful in finding some freely available fonts if necessary.

Hope this helps.
Best, Sergiy Radyakin





---------- Forwarded message ----------
From: Pavel Izhutov <[email protected]>
Date: Wed, Nov 27, 2013 at 6:32 PM
Subject: Import win1251-encoded data in Stata
To: [email protected]


Dear Sergiy,

I am Pavel, PhD student at Stanford working with data with Russian origin.

I noticed some of your posts on the web w.r.t. importing files with
non-standard encoding into Stata.

Could you please tell me what encoding I should use  in the .csv file
in order to work with and browse my data seamlessly?

I presume this might depend on the system too (I word on Linux and OS
X). Is there any way to tailor the encoding for the particular system
or, perhaps, tell Stata to use certain encoding for data display?

Many thanks,
Pavel

--
________________________________________________________
Stanford University | Graduate School of Business | PhD Candidate

Think Clearly, Communicate Effectively, Be an Athlete,
Use Machines Wisely, Live Simply
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index