Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: gettoken -- bug?


From   "Sergiy Radyakin" <[email protected]>
To   [email protected]
Subject   Re: st: gettoken -- bug?
Date   Mon, 8 Sep 2008 17:50:57 -0400

Hello Johannes and Ben,


I think it is in the manual: "..., if `left' begins with a quotation
mark, the result will not be what you expect". But it does not work if
there is another token preceeding it either. I think it is because of
the recursive definition of tokenize/gettoken.

But this exception raises a question , what if quotes were used as a
separator? (very weird case without a doubt)
And also how does insheet parses the data with quotes? In my
understanding it uses tokenize/gettoken inside, so it must be prone to
the same problem. I don't see how possibly this data:

1"2"3"4
5"6"7"8

can be read by

. insheet using "X:\quotesdata.", double delimiter(`"""') clear nonames

as

. list
     +----+
     | v1 |
     |----|
  1. | 78 |
     +----+

How did 7 and 8 ended up as 78? And why didn't 3 and 4? Aren't
observations independent from one another?

Not that I have ever seen any dataset where quotes were used as
separators, but what if?

Also (if talking about bugs) when dialogs are used, filenames must be
passed exactly. Otherwise some commands (like insheet) will try to add
their default extensions and possibly create a problem. E.g. when run
via a dialog, the above command receives a filename without an
extension x:\quotesdata which is precisely what the file name is -
without an extension. This is what I _click_ in the file open dialog.
Later this name is passed to insheet and .raw extension is added, and
since quotesdata.raw does not exist (if we are lucky), an error
message appears (if that file existed - wrong dataset would have been
opened without users having any warning or notification).

This is very different from what other programs do, e.g. if I click a
file in the open dialog of Microsoft Word, it opens this file
regardless of .doc extension presence because this is what I tell it
to do: "open this file".

Even as .raw does not appear to be a popular extension, still this can
be a source of a hard-to-track error. It is written in the
documentation that .raw will be added if no extension is specified,
but when I select a file with a mouse, filename should not be changed
later by any command.

In the meanwhile one always has to add a dot after the filename if
there is no extension to be on the safe side.

Regards,
   Sergiy Radyakin




On Mon, Sep 8, 2008 at 3:35 PM, Ben Jann <[email protected]> wrote:
> Double quotes always mark off a token, as far as I know. Example:
>
> . local list `""a"b"'
>
> . macro dir _list
> _list:          "a"b
>
> . gettoken one two : list
>
> . macro dir _one _two
> _one:           a
> _two:           b
>
> If this is not documented, I guess it should.
>
> If you want to prevent this, you can use the -quotes- and -qed()-
> options and then do some checks and put stuff together again if
> necessary.
> ben
>
>
> On Mon, Sep 8, 2008 at 9:10 PM, Johannes Schmieder <[email protected]> wrote:
>> Hi,
>> Can someone tell me whether this is intended behavior or a bug in
>> Stata (V10, SE Intercooled):
>>
>> . local list `" "a b" c , d "'
>> . macro dir  // omitting the system macros
>> _list:           "a b" c , d
>> _one:           `"hello.dta"'
>>
>> . gettoken one two : list , p(",")
>> . macro dir
>> _two:            c , d
>> _one:           a b
>> _list:           "a b" c , d
>>
>> Shouldn't the local "one" contain:  "a b" c
>> rather than just: a b
>>
>> If this is intended behavior this is not well documented in the help
>> file or the manual. I realize the first technical note in the manual
>> recommends to include " " as a parsing character, but this still seems
>> to be strange and not in line with the documentation.
>>
>> Johannes Schmieder
>> *
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index