Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Egen ends behavior apparently violates rules in documentation


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Egen ends behavior apparently violates rules in documentation
Date   Wed, 15 Jan 2003 19:36:03 -0000

Devra Golbe
>
>   the following code fragment suggests that egen ends does
> not operate as
> the documentation  (pasted at the bottom of my message)
> suggests.  Specifically,  with the tail option, if no punct(pchars)
> appears, the tail should be empty. However, my output
> suggests that under
> these circumstances (no appearance of pchars) the tail
> contains the entire
> "word".
>
> Thanks in advance for your insights.
>
> Devra
>
> *******
> . which egen
> C:\STATA7\ado\updates\e\egen.ado
> *! version 3.3.2  07may2001
>
>
> . egen consid1=ends(consid), punct(*) head;
>
> . egen considt=ends(consid), punct(*) tail;
>
> . egen consid2=ends(considt), punct(*) head;
>
> . egen considt2=ends(considt), punct(*) tail;
>
> . list consid  consid1 consid2 considt2 in 1/10;
>
>                               consid    consid1    consid2
>            considt2
>    1.                  CASH*NOTE*LIA       CASH       NOTE
>                 LIA
>    2.                           CASH       CASH       CASH
>                CASH
>    3.                           CASH       CASH       CASH
>                CASH
>
> *******
>
>
>
> ends(strvar) [, punct(pchars) trim [head|tail|last]]
>      may not be combined with by.  It gives the first
> "word" or head (with
> the head option), last
>      "word" (with the last option), or the remainder or
> tail (with the tail
> option) from string
>      variable strvar.
>
>      head, last and tail are determined by the occurrence
> of pchars, which
> is by default a single
>      space " ".
>
>      [snip]
>
>      The remainder or tail is whatever follows the first
> occurrence of
> pchars, which will be the empty
>      string "" if it does not occur.  The tail of "frog
> toad newt" is "toad
> newt" and of "frog" is "".
>      With punct(,), the tail of "frog,toad" is "toad".

I think Devra is right.

The function -egen, ends() tail- has a history.

There was (and is) a user-written function,
-egen, tail()-, published in STB-50 in 1999, and indeed
earlier as a function within the -egenodd- package
from SSC. I was the author.

In Stata 7, -tail()- was implemented as a part
of a new -egen- function, -ends()-, which Devra
is using.

However, the behavior was changed. I don't recall
any discussion of this. Whatever, the documentation
of this option for the function was copied
essentially unchanged from the STB.

I agree with Devra that the documentation does not
match the behaviour.

In a nutshell, here is the different behaviour:

          mystr      tail7    tailSTB
  1.       frog       frog
  2.  frog toad       toad       toad

The code implementation of "tail" in Stata 7
is

substr(<strvar>,`index'+`plen',.)

where `index' gives the position
of the first occurrence of pchars,
by default a space, and `plen' is
the length of pchars, which will be 1
for a single space. Thus whenever
pchars does not occur within the
string variable <strvar>, this
evaluates as

substr(<strvar>,1,.)

as Devra observed.

I regard

substr(<strvar>,`index'+`plen',.)

as a nice defintion of a tail, except
that

1. it is not matched by the manual
documentation

2. it does not match the original
intent that "head" and "tail" are
disjoint, that head and tail
put together make up the string,
apart from the punctuation which
acts a kind of neck.

2 is just a matter of history,
except that the terminology I used, which
was adopted within official Stata,
was meant to be vivid and helpful
in indicating what the functions
to do.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index