Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

R: st: (Il)Legal variable/macro names?


From   <giancarlo.marra@bancaditalia.it>
To   <statalist@hsphsun2.harvard.edu>
Subject   R: st: (Il)Legal variable/macro names?
Date   Fri, 26 Oct 2007 11:24:26 +0200

Unfortunately, using extended ASCII characters (i.e. beyond the ASCII code 127) is not always a good idea.
If you work on an architecture  more complex than Windows, e.g. on unix/linux servers through terminal emulators 
and locale settigs involving UNICODE, you can easily get something like :

. display strlen("")
2
 
that is perfectly logical because in UNICODE "" is coded in two octets, and Stata strlen does not take into
account the underlaying character coding (as I can guess).
But it is highly counter-intuitive for users not accustomed with such a kind of topics, and that think that 
strlen should mean string-length (as claimed), and not number-of-octets-needed-to-represent-the-string (as it seems to be).

I think that you should consider absolutely incidental being allowed to generate a permanent varname using 
accented characters, you should not do that, in my view. And you should not use them in any part of a Stata program.
Even using them in a string might cause problems, if it happens you need to test the lenght of the string.
In any case it is a further limitation on programs portability. 

The alternative should be that Stata will fully support UNICODE character coding: is it worth ?
Or that one can be sure the none of the layers between his/her keyboard and the Stata executable will trenslate
the characters into UNICODE coding.

Characters and keyboards are ugly beasts : think what a mess is for an italian Stata user not having the 
Stata-ubiquitous left-quote character on his/her keyboard !

Giancarlo Marra


 

-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Schaffer, Mark E
Inviato: gioved 25 ottobre 2007 20.52
A: statalist@hsphsun2.harvard.edu
Oggetto: RE: st: (Il)Legal variable/macro names?

Since I'm the bitee, I'll comment:

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of n j cox
> Sent: 25 October 2007 18:11
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: (Il)Legal variable/macro names?
> 
> This is mostly for StataCorp, but I'll comment.
> 
> I think Mark has been bitten by a bug; the question is where 
> is the bug.
> 
> 1. Is it that -tempvar- allows a name that is really illegal?
> 
> 2. Is it whatever caused the statement that failed to 
> recognise a legal macro name? (Apparently, a parser limitation.)

I think it's either

3.  Is it that Stata's rules for what is/isn't a legal name, which should be the same for all objects - variables, scalars, matrices, macros - actually vary across objects?

or

4.  Is it that -tempvar- doesn't properly handle a name that is really legal?

or

5.  Is it that the macro substitution in -gen `u' = mpg- fails?

My guess is (5).  Here's the same example, but this time I list all macros after the call to -tempvar-.  Note that the macro u has been created!

.. do "C:\DOCUME~1\MARKSC~1\LOCALS~1\Temp\STD0l000000.tmp"

.. sysuse auto, clear
(1978 Automobile Data)

.. 
.. capture program drop legalnames

.. program define legalnames
  1. gen u = mpg
  2. sum u
  3. tempvar u1 u u2
  4. macro dir
  5. gen `u' = mpg
  6. sum `u'
  7. end

.. 
.. set trace on

.. legalnames
  ----------------------------------------------- begin legalnames ---
  - gen u = mpg
  - sum u

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          u |        74     21.2973    5.785503         12         41
  - tempvar u1 u u2
  - macro dir
S_FNDATE:       13 Apr 2005 17:45
S_FN:           C:\Stata9\ado\base/a/auto.dta
<snip>
_u2:            __000002
_u:            __000001
_u1:            __000000
  - gen `u' = mpg
  = gen  = mpg
too few variables specified
  ------------------------------------------------ end legalnames ---
r(102);

end of do-file
r(102);


Soooo ... looks like a macro expansion bug.

--Mark

> StataCorp will decide which it is. A wild guess is that it 
> will be much easier to fix -tempvar- and
> -tempname- to disallow names like Mark's than to ensure that 
> names like his work everywhere they might be used -- on all 
> versions of Stata on all platforms in all circumstances.
> 
> Either way, there is now a small mystery on exactly what 
> characters are really allowed within names.
> 
> I make a distinction:
> 
> 1. As a Stata user, I want StataCorp to do the maximum 
> possible to let me use whatever characters I need for 
> _labelling output_. Typically, I try hard to use correct 
> spelling, including accents, wherever appropriate in variable 
> labels, value labels and graph annotation (not to mention the 
> old question of mathematical symbols and Greek characters). I 
> trust that is not controversial or disagreeable. I am much 
> less fussed about characters in (permanent) variable names. 
> That may well, naturally, be much more important to people 
> using languages more accented than English.
> 
> 2. As a Stata programmer, I am happy to accept a very limited 
> character set A-Z a-z 0-9 _ for macro names. It would be 
> interesting to hear arguments to the opposite effect in 
> addition to Mark's want.
> 
> Nick
> n.j.cox@durham.ac.uk
> 
> Schaffer, Mark E
> 
> I've just been bitten by an odd inconsistency between what 
> constitutes a legal name for a variable and a legal name for 
> a macro.  8-bit ascii characters are apparently legal in 
> variable names, but when used in a macro name, no macro is created.
> 
> Here's an example using the auto dataset.  The first part 
> shows that the variable name u is legal.  The second part 
> shows that when I try to use
> -tempvar- to create a macro called "u", nothing is created - 
> when Stata gets to the next line, macro substitution means 
> `u' becomes ... nothing.
> 
> . do "C:\DOCUME~1\MARKSC~1\LOCALS~1\Temp\STD0l000000.tmp"
> 
> . sysuse auto, clear
> (1978 Automobile Data)
> 
> .
> . capture program drop legalnames
> 
> . program define legalnames
>    1. gen u = mpg
>    2. sum u
>    3. tempvar u
>    4. gen `u' = mpg
>    5. sum `u'
>    6. end
> 
> .
> . set trace on
> 
> . legalnames
>    ----------------------------------------------- begin 
> legalnames ---
>    - gen u = mpg
>    - sum u
> 
>      Variable |       Obs        Mean    Std. Dev.       Min  
>       Max
> -------------+--------------------------------------------------------
>            u |        74     21.2973    5.785503         12  
>        41
>    - tempvar u
>    - gen `u' = mpg
>    = gen  = mpg
> too few variables specified
>    ------------------------------------------------- end 
> legalnames --- r(102);
> 
> end of do-file
> r(102);
> 
> 
> I can't find anything about this in the manuals, but the behavior of
> -tempvar- does look bug-like - if an illegal macro name is 
> used, shouldn't -tempvar- complain?
> 
> In programs I sometimes generate macro names based on 
> variable names, so if the naming rules are actually different 
> for variable names and macro names, this is not a good strategy.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non 
comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non 
sia espressamente previsto da un accordo scritto.
Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di 
comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La 
informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati 
potrebbe costituire reato. Grazie per la collaborazione.
-- E-mails from the Bank of Italy are sent in good faith but they are neither binding on 
the Bank nor to be understood as creating any obligation on its part except where 
provided for in a written agreement. This e-mail is confidential. If you have received it 
by mistake, please inform the sender by reply e-mail and delete it from your system. 
Please also note that the unauthorized disclosure or use of the message or any 
attachments could be an offence. Thank you for your cooperation. **

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index