Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Sergiy Radyakin <serjradyakin@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: foreign language symbols not recognized in string variables |

Date |
Fri, 26 Apr 2013 19:15:30 -0400 |

Dear Audinga, Most modern software (OS and applications) work with Unicode. Stata does not work with Unicode. Unicode encodes characters with 2 or more bytes. In Stata each character must be 1 byte only. You need to make sure the input CSV file is encoded in a codepage proper for your region, presumably 1252. See more here: http://en.wikipedia.org/wiki/Windows-1252 Make sure your text editor does NOT save the text in Unicode. Some editors do this automatically when they detect non-ASCII characters. Note that you can use only one codepage for the whole file (characters absent in ASCII and current ANSI page will be usually replaced with the ?-marks). Note that Stata still may not be displaying these characters correctly (you also need to set up the fonts in Stata correctly), but may be processing the data correctly. Use the -hexdump- command of Stata to see the hex codes of characters in your file. I worked with Cyrillic, Vietnamese and other alphabets in Stata, so this is definitely possible. Just requires some adjustment of settings. Hope this helps. Best, Sergiy On Fri, Apr 26, 2013 at 5:45 PM, Audinga Baltrunaite <audinga.baltrunaite@ne.su.se> wrote: > Hello, > I am importing data from a .CSV file to Stata using an "insheet" > command. The data set contains several string variables in Lithuanian > language (it uses an extended version of a Latin alphabet). Even > though "Lithuanian" letters are viewed correctly in a .CVS file, Stata > substitutes them with other symbols (specials symbols, combinations of > a few letters, etc). Moreover, even when I manually input the correct > letters from the keyboard using the Data Editor, changes are not saved > - the old symbol is deleted, but the new symbol does not appear. > > The only solution I have in mind now is to eliminate Lithuanian > letters from the .CSV file transforming them into (more or less) > equivalent English ones. Since some words may loose their sense, it is > far from an ideal one. > > Has anybody encountered similar problems? Any solutions? > > Thanks a lot! > Best, > Audinga > > On Fri, Apr 26, 2013 at 5:28 PM, Audinga Baltrunaite > <audinga.baltrunaite@ne.su.se> wrote: >> Hello, >> I am importing data from a .CSV file to Stata using an "insheet" command. >> The data set contains several string variables in Lithuanian language (it >> uses an extended version of a Latin alphabet). Even though "Lithuanian" >> letters are viewed correctly in a .CVS file, Stata substitutes them with >> other symbols (specials symbols, combinations of a few letters, etc). >> Moreover, even when I manually input the correct letters from the keyboard >> using the Data Editor, changes are not saved - the old symbol is deleted, >> but the new symbol does not appear. >> >> The only solution I have in mind now is to eliminate Lithuanian letters from >> the .CSV file transforming them into (more or less) equivalent English ones. >> Since some words may loose their sense, it is far from an ideal one. >> >> Has anybody encountered similar problems? Any solutions? >> >> Thanks a lot! >> Best, >> Audinga >> > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: foreign language symbols not recognized in string variables***From:*Audinga Baltrunaite <audinga.baltrunaite@ne.su.se>

- Prev by Date:
**Re: st: collinearity in categorical variables** - Next by Date:
**Re: st: foreign language symbols not recognized in string variables** - Previous by thread:
**st: foreign language symbols not recognized in string variables** - Next by thread:
**Re: st: foreign language symbols not recognized in string variables** - Index(es):