Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: foreign language symbols not recognized in string variables


From   Audinga Baltrunaite <audinga.baltrunaite@ne.su.se>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: foreign language symbols not recognized in string variables
Date   Tue, 30 Apr 2013 16:45:29 -0400

Hi,
thanks a lot for your replies and explanations.

Unfortunately, I have not solved my problem yet and will need further
advice. So, my data are oginally encoded in UTF-8 and are displayed
"seemingly correctly" in MySQL from where I export them into a  .CSV
file. However, I have not found an encoding that would work to view
the "correct" symbols in Stata. Windows-1257 encoding seems to be the
right one for the Lithuanian language, but even after having changed
the encoding of the .CSV file, symbols become "something else" in
Stata. I have also tried to use other encodings for Baltic languages,
like Baltic DOS/OS2-775, Baltic ISO-8859-4, Eastern Europen (Apple
Macintosh), but none of them worked.

I have not mentioned earlier that I am using Stata on a Mac computer.
Could this potentially create problems? Moreover, could it be that my
Stata fonts are not set up correctly? How can I check this and
eventually modify? Sorry for these perhaps basic questions, but I have
never encountered similar issues.

Many many thanks!
Best,
Audinga




On Fri, Apr 26, 2013 at 7:15 PM, Sergiy Radyakin <serjradyakin@gmail.com> wrote:
> Dear Audinga,
>
> Most modern software (OS and applications) work with Unicode. Stata
> does not work with Unicode. Unicode encodes characters with 2 or more
> bytes. In Stata each character must be 1 byte only. You need to make
> sure the input CSV file is encoded in a codepage proper for your
> region, presumably 1252.
>
> See more here:
> http://en.wikipedia.org/wiki/Windows-1252
>
> Make sure your text editor does NOT save the text in Unicode. Some
> editors do this automatically when they detect non-ASCII characters.
>
> Note that you can use only one codepage for the whole file (characters
> absent in ASCII and current ANSI page will be usually replaced with
> the ?-marks).
>
> Note that Stata still may not be displaying these characters correctly
> (you also need to set up the fonts in Stata correctly), but may be
> processing the data correctly.  Use the -hexdump- command of Stata to
> see the hex codes of characters in your file.
>
> I worked with Cyrillic, Vietnamese and other alphabets in Stata, so
> this is definitely possible. Just requires some adjustment of
> settings.
>
> Hope this helps.
>
> Best, Sergiy
>
>
>
> On Fri, Apr 26, 2013 at 5:45 PM, Audinga Baltrunaite
> <audinga.baltrunaite@ne.su.se> wrote:
>> Hello,
>> I am importing data from a .CSV file to Stata using an "insheet"
>> command. The data set contains several string variables in Lithuanian
>> language (it uses an extended version of a Latin alphabet). Even
>> though "Lithuanian" letters are viewed correctly in a .CVS file, Stata
>> substitutes them with other symbols (specials symbols, combinations of
>> a few letters, etc). Moreover, even when I manually input the correct
>> letters from the keyboard using the Data Editor, changes are not saved
>> - the old symbol is deleted, but the new symbol does not appear.
>>
>> The only solution I have in mind now is to eliminate Lithuanian
>> letters from the .CSV file transforming them into (more or less)
>> equivalent English ones. Since some words may loose their sense, it is
>> far from an ideal one.
>>
>> Has anybody encountered similar problems? Any solutions?
>>
>> Thanks a lot!
>> Best,
>> Audinga
>>
>> On Fri, Apr 26, 2013 at 5:28 PM, Audinga Baltrunaite
>> <audinga.baltrunaite@ne.su.se> wrote:
>>> Hello,
>>> I am importing data from a .CSV file to Stata using an "insheet" command.
>>> The data set contains several string variables in Lithuanian language (it
>>> uses an extended version of a Latin alphabet). Even though "Lithuanian"
>>> letters are viewed correctly in a .CVS file, Stata substitutes them with
>>> other symbols (specials symbols, combinations of a few letters, etc).
>>> Moreover, even when I manually input the correct letters from the keyboard
>>> using the Data Editor, changes are not saved - the old symbol is deleted,
>>> but the new symbol does not appear.
>>>
>>> The only solution I have in mind now is to eliminate Lithuanian letters from
>>> the .CSV file transforming them into (more or less) equivalent English ones.
>>> Since some words may loose their sense, it is far from an ideal one.
>>>
>>> Has anybody encountered similar problems? Any solutions?
>>>
>>> Thanks a lot!
>>> Best,
>>> Audinga
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index