[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: another data cleaning question
Thanks Nick..the notes on string vs numeric were very insightful. About
the specific question, I looked it up in the manual and all is good (I
hadnt figured out what the ,1,3 meant)
--- Nick Cox <firstname.lastname@example.org> wrote:
> Babigumira Ronnie
> > Thanks for the help. The code produces the same result as what Hakon
> > suggested however you make a comment More generally, whenever codes
> > pseudo-numeric, there are several advantages to holding them as
> > which has generated interest in me so I would like to pursue it
> > You suggest
> > list if string(cropcode) != substr(string(varcode,1,3)
> > Now that the code works, I would like to know the underlying
> > Please throw some more light especially on the right hand side of the
> My general remark merely echoes a comment often made on
> Statalist. I will write down what springs
> to mind. Others should feel very free as usual
> to amplify and correct. Perhaps this is an FAQ
> in embryonic form. Also, as mentioned before
> on Statalist, the topic of numeric and string
> variables will be the subject of the next
> "Speaking Stata" column in the Stata Journal, so
> extra comments will be gratefully received.
> 1. Identifiers which are all numeric often cause small
> problems. U.S. social security numbers appear
> to be the most common example mentioned on
> Statalist. To hold such identifiers without
> precision problems (i.e. every digit held
> exactly) may require the use of a -long- variable.
> Along with that, so to speak, to display such a variable
> may require changing format to avoid most
> digits being lost whenever identifiers are presented in
> scientific notation. These are small and soluble problems,
> but frequently cause puzzlement to Stata users.
> Holding such identifiers as strings, even though every
> character is numeric, solves those problems, with
> no apparent downside.
> 2. Categorical codes which are multi-digit numbers
> are often constructed hierarchically: that is, successive
> digits take you to finer detail within some
> classification system. When such codes are held
> as numbers, working from fine to coarse categories, or
> vice versa, can be done via tricks with -int()- and
> occasionally -mod()-. These tricks strike many users as neat when they
> are familiar but indirect or obscure when they are not.
> However, the corresponding operations on such codes held as strings
> can be done via -substr()- and occasionally -index()-
> and these operations are often more transparent to users.
> 3. A more elementary error is to forget, especially
> for statistical rather than data management commands,
> that a variable may be numeric to Stata without being
> a variable which may fairly be included in a statistical
> model as is. It is arguable that a habit of holding arbitrary
> numeric codes as strings provides some protection against
> foolish statistics of this kind.
> P.S. I am not clear on Roni's specific question. The -string()-
> function converts to string, whereafter -substr()- extracts
> specified characters.
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
* For searches and help try: