[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Insheeting Japanese |

Date |
Tue, 23 Sep 2008 11:19:15 -0700 |

I've been informed that the files are written in unicode, utf-16. Can Stata read this? On Tue, Sep 23, 2008 at 11:08 AM, Dan Weitzenfeld <dan.weitzenfeld@emsense.com> wrote: > Thanks Sergiy, I did not know about that command. Below is a line > from my hexdump: > > 130 | 304b ff1f 002c 0031 002c 0032 000d 000a | 0K...,.1.,.2.... > > I also noticed this when I ran with option Analyze: > > Line-end characters > \r\n (Windows) 0 > \r by itself (Mac) 5 > \n by itself (Unix) 5 > > which looks suspicious to me. I'll talk to the tech guys who made this file. > Thanks again Sergiy. > > > > On Tue, Sep 23, 2008 at 10:51 AM, Sergiy Radyakin > <serjradyakin@gmail.com> wrote: >> Dear Dan, >> >> how data "looks like" depends on, which software "looks" at it. From >> what I see in your message, there is double-byte encoding of letters >> which may cause a problem. >> >> I suggest you first "look" at your data byte-by-byte, to find a >> pattern you need, then filter your data based on that pattern. >> Use >> -hexdump- filename >> to see how your data is structured. Check that you are using correct >> separator "comma" and not "tab", that "comma" in your file is indeed a >> standard ASCII "comma" and not some weird two-bytes comma, that a >> "comma" byte (44) is not used for encoding other characters, etc. >> >> Perhaps you could post a portion of output from hexdump here if this >> does not contradict any rules of the list. >> >> Regards, Sergiy Radyakin >> >> >> On Tue, Sep 23, 2008 at 1:09 PM, Dan Weitzenfeld >> <dan.weitzenfeld@emsense.com> wrote: >>> Hi All, >>> Quick but strange question. I'm trying to insheet a comma-delimited >>> file with Japanese in it. For example, the first line looks like: >>> >>> あなたはこのＣＭが好きですか？,0,とても好き >>> >>> The only information I need is the second variable, the 0, which will >>> always be numeric. >>> >>> However, when I insheet the file, I get nonsense: >>> >>> þÿ0B0j0_0o0S0nÿ#ÿ-0LY}0M0g0Y0Kÿ 0h0f0‚Y}0M >>> >>> which would be okay, except that the second variable always comes in as blank. >>> >>> Does anyone know of a solution for this? >>> >>> Thanks in advance, >>> Dan >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Insheeting Japanese***From:*Steven Samuels <sjhsamuels@earthlink.net>

**References**:**st: Insheeting Japanese***From:*"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com>

**Re: st: Insheeting Japanese***From:*"Sergiy Radyakin" <serjradyakin@gmail.com>

**Re: st: Insheeting Japanese***From:*"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com>

- Prev by Date:
**st: what's wrong with this statement?** - Next by Date:
**Re: st: what's wrong with this statement?** - Previous by thread:
**Re: st: Insheeting Japanese** - Next by thread:
**Re: st: Insheeting Japanese** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |