Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Audinga Baltrunaite <audinga.baltrunaite@ne.su.se> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: foreign language symbols not recognized in string variables |

Date |
Tue, 30 Apr 2013 16:45:29 -0400 |

Hi, thanks a lot for your replies and explanations. Unfortunately, I have not solved my problem yet and will need further advice. So, my data are oginally encoded in UTF-8 and are displayed "seemingly correctly" in MySQL from where I export them into a .CSV file. However, I have not found an encoding that would work to view the "correct" symbols in Stata. Windows-1257 encoding seems to be the right one for the Lithuanian language, but even after having changed the encoding of the .CSV file, symbols become "something else" in Stata. I have also tried to use other encodings for Baltic languages, like Baltic DOS/OS2-775, Baltic ISO-8859-4, Eastern Europen (Apple Macintosh), but none of them worked. I have not mentioned earlier that I am using Stata on a Mac computer. Could this potentially create problems? Moreover, could it be that my Stata fonts are not set up correctly? How can I check this and eventually modify? Sorry for these perhaps basic questions, but I have never encountered similar issues. Many many thanks! Best, Audinga On Fri, Apr 26, 2013 at 7:15 PM, Sergiy Radyakin <serjradyakin@gmail.com> wrote: > Dear Audinga, > > Most modern software (OS and applications) work with Unicode. Stata > does not work with Unicode. Unicode encodes characters with 2 or more > bytes. In Stata each character must be 1 byte only. You need to make > sure the input CSV file is encoded in a codepage proper for your > region, presumably 1252. > > See more here: > http://en.wikipedia.org/wiki/Windows-1252 > > Make sure your text editor does NOT save the text in Unicode. Some > editors do this automatically when they detect non-ASCII characters. > > Note that you can use only one codepage for the whole file (characters > absent in ASCII and current ANSI page will be usually replaced with > the ?-marks). > > Note that Stata still may not be displaying these characters correctly > (you also need to set up the fonts in Stata correctly), but may be > processing the data correctly. Use the -hexdump- command of Stata to > see the hex codes of characters in your file. > > I worked with Cyrillic, Vietnamese and other alphabets in Stata, so > this is definitely possible. Just requires some adjustment of > settings. > > Hope this helps. > > Best, Sergiy > > > > On Fri, Apr 26, 2013 at 5:45 PM, Audinga Baltrunaite > <audinga.baltrunaite@ne.su.se> wrote: >> Hello, >> I am importing data from a .CSV file to Stata using an "insheet" >> command. The data set contains several string variables in Lithuanian >> language (it uses an extended version of a Latin alphabet). Even >> though "Lithuanian" letters are viewed correctly in a .CVS file, Stata >> substitutes them with other symbols (specials symbols, combinations of >> a few letters, etc). Moreover, even when I manually input the correct >> letters from the keyboard using the Data Editor, changes are not saved >> - the old symbol is deleted, but the new symbol does not appear. >> >> The only solution I have in mind now is to eliminate Lithuanian >> letters from the .CSV file transforming them into (more or less) >> equivalent English ones. Since some words may loose their sense, it is >> far from an ideal one. >> >> Has anybody encountered similar problems? Any solutions? >> >> Thanks a lot! >> Best, >> Audinga >> >> On Fri, Apr 26, 2013 at 5:28 PM, Audinga Baltrunaite >> <audinga.baltrunaite@ne.su.se> wrote: >>> Hello, >>> I am importing data from a .CSV file to Stata using an "insheet" command. >>> The data set contains several string variables in Lithuanian language (it >>> uses an extended version of a Latin alphabet). Even though "Lithuanian" >>> letters are viewed correctly in a .CVS file, Stata substitutes them with >>> other symbols (specials symbols, combinations of a few letters, etc). >>> Moreover, even when I manually input the correct letters from the keyboard >>> using the Data Editor, changes are not saved - the old symbol is deleted, >>> but the new symbol does not appear. >>> >>> The only solution I have in mind now is to eliminate Lithuanian letters from >>> the .CSV file transforming them into (more or less) equivalent English ones. >>> Since some words may loose their sense, it is far from an ideal one. >>> >>> Has anybody encountered similar problems? Any solutions? >>> >>> Thanks a lot! >>> Best, >>> Audinga >>> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Translate Google Trends date system into Stata date system** - Next by Date:
**st: intraclass correlation?** - Previous by thread:
**Re: Re: st: foreign language symbols not recognized in string variables** - Next by thread:
**st: Marginal effects plot for tobit with interaction term** - Index(es):