[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Insheeting Japanese |

Date |
Tue, 23 Sep 2008 14:28:51 -0400 |

Dan, I don't know if Stata can read unicode. The -help- for - insheet- states it is for ASCII text. One possibility; use a text editor to add double quotes (") at the beginning and end of lines and on either side of the commas. This may read everything as character. Then convert the convert back to real only the variable you want.

-Steve

On Sep 23, 2008, at 2:19 PM, Dan Weitzenfeld wrote:

I've been informed that the files are written in unicode, utf-16. Can Stata read this? On Tue, Sep 23, 2008 at 11:08 AM, Dan Weitzenfeld <dan.weitzenfeld@emsense.com> wrote:Thanks Sergiy, I did not know about that command. Below is a line

from my hexdump:

130 | 304b ff1f 002c 0031 002c 0032 000d 000a | 0K...,. 1.,.2....

I also noticed this when I ran with option Analyze:

Line-end characters

\r\n (Windows) 0

\r by itself (Mac) 5

\n by itself (Unix) 5

which looks suspicious to me. I'll talk to the tech guys who made this file.

Thanks again Sergiy.

On Tue, Sep 23, 2008 at 10:51 AM, Sergiy Radyakin

<serjradyakin@gmail.com> wrote:

Dear Dan,

how data "looks like" depends on, which software "looks" at it. From

what I see in your message, there is double-byte encoding of letters

which may cause a problem.

I suggest you first "look" at your data byte-by-byte, to find a

pattern you need, then filter your data based on that pattern.

Use

-hexdump- filename

to see how your data is structured. Check that you are using correct

separator "comma" and not "tab", that "comma" in your file is indeed a

standard ASCII "comma" and not some weird two-bytes comma, that a

"comma" byte (44) is not used for encoding other characters, etc.

Perhaps you could post a portion of output from hexdump here if this

does not contradict any rules of the list.

Regards, Sergiy Radyakin

On Tue, Sep 23, 2008 at 1:09 PM, Dan Weitzenfeld

<dan.weitzenfeld@emsense.com> wrote:

Hi All,

Quick but strange question. I'm trying to insheet a comma- delimited

file with Japanese in it. For example, the first line looks like:

あなたはこのＣＭが好きですか？,0,とても好き

The only information I need is the second variable, the 0, which will

always be numeric.

However, when I insheet the file, I get nonsense:

þÿ0B0j0_0o0S0nÿ#ÿ-0LY}0M0g0Y0Kÿ 0h0f0‚Y}0M

which would be okay, except that the second variable always comes in as blank.

Does anyone know of a solution for this?

Thanks in advance,

Dan

*

* For searches and help try:

* http://www.stata.com/help.cgi?search

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Insheeting Japanese***From:*"Austin Nichols" <austinnichols@gmail.com>

**References**:**st: Insheeting Japanese***From:*"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com>

**Re: st: Insheeting Japanese***From:*"Sergiy Radyakin" <serjradyakin@gmail.com>

**Re: st: Insheeting Japanese***From:*"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com>

**Re: st: Insheeting Japanese***From:*"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com>

- Prev by Date:
**Re: st: what's wrong with this statement?** - Next by Date:
**RE: st: Insheeting Japanese** - Previous by thread:
**Re: st: Insheeting Japanese** - Next by thread:
**Re: st: Insheeting Japanese** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |