Austin Nichols

statalist@hsphsun2.harvard.edu

Re: st: Insheeting Japanese

Tue, 23 Sep 2008 14:35:05 -0400

Dan Weitzenfeld : Stata's -file- command can deal with this file; see -help file- for examples of writing a loop to process a file. But converting in another program, then using -infile- or -insheet-, is likely easier. The optimal approach depends on how often you will face this situation again in future... On Tue, Sep 23, 2008 at 2:28 PM, Steven Samuels <sjhsamuels@earthlink.net> wrote: > Dan, I don't know if Stata can read unicode. The -help- for -insheet- > states it is for ASCII text. One possibility; use a text editor to add > double quotes (") at the beginning and end of lines and on either side of > the commas. This may read everything as character. Then convert the convert > back to real only the variable you want. > > -Steve > > On Sep 23, 2008, at 2:19 PM, Dan Weitzenfeld wrote: > >> I've been informed that the files are written in unicode, utf-16. Can >> Stata read this? >> >> On Tue, Sep 23, 2008 at 11:08 AM, Dan Weitzenfeld >> <dan.weitzenfeld@emsense.com> wrote: >>> >>> Thanks Sergiy, I did not know about that command. Below is a line >>> from my hexdump: >>> >>> 130 | 304b ff1f 002c 0031 002c 0032 000d 000a | >>> 0K...,.1.,.2.... >>> >>> I also noticed this when I ran with option Analyze: >>> >>> Line-end characters >>> \r\n (Windows) 0 >>> \r by itself (Mac) 5 >>> \n by itself (Unix) 5 >>> >>> which looks suspicious to me. I'll talk to the tech guys who made this >>> file. >>> Thanks again Sergiy. >>> >>> >>> >>> On Tue, Sep 23, 2008 at 10:51 AM, Sergiy Radyakin >>> <serjradyakin@gmail.com> wrote: >>>> >>>> Dear Dan, >>>> >>>> how data "looks like" depends on, which software "looks" at it. From >>>> what I see in your message, there is double-byte encoding of letters >>>> which may cause a problem. >>>> >>>> I suggest you first "look" at your data byte-by-byte, to find a >>>> pattern you need, then filter your data based on that pattern. >>>> Use >>>> -hexdump- filename >>>> to see how your data is structured. Check that you are using correct >>>> separator "comma" and not "tab", that "comma" in your file is indeed a >>>> standard ASCII "comma" and not some weird two-bytes comma, that a >>>> "comma" byte (44) is not used for encoding other characters, etc. >>>> >>>> Perhaps you could post a portion of output from hexdump here if this >>>> does not contradict any rules of the list. >>>> >>>> Regards, Sergiy Radyakin >>>> >>>> >>>> On Tue, Sep 23, 2008 at 1:09 PM, Dan Weitzenfeld >>>> <dan.weitzenfeld@emsense.com> wrote: >>>>> >>>>> Hi All, >>>>> Quick but strange question. I'm trying to insheet a comma-delimited >>>>> file with Japanese in it. For example, the first line looks like: >>>>> >>>>> あなたはこのＣＭが好きですか？,0,とても好き >>>>> >>>>> The only information I need is the second variable, the 0, which will >>>>> always be numeric. >>>>> >>>>> However, when I insheet the file, I get nonsense: >>>>> >>>>> þÿ0B0j0_0o0S0nÿ#ÿ-0LY}0M0g0Y0Kÿ 0h0f0‚Y}0M >>>>> >>>>> which would be okay, except that the second variable always comes in as >>>>> blank. >>>>> >>>>> Does anyone know of a solution for this? >>>>> >>>>> Thanks in advance, >>>>> Dan >>>>> >>>>> * >>>>> * For searches and help try: >>>>> * http://www.stata.com/help.cgi?search >>>>> * http://www.stata.com/support/statalist/faq >>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>> >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

