Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: help cleaning string variable


From   "Marino, Jennifer" <[email protected]>
To   <[email protected]>
Subject   st: RE: help cleaning string variable
Date   Mon, 27 Feb 2006 16:48:03 -0800

I don't know if it's necessary in Stata 9 - might have been put into the
official egen package if it was used enough - but for Stata 8 the
fabulous ado package -egenmore-, by Dr. Cox, has a tailor-made option
for egen called "sieve": 

Excerpt from the helpfile:

sieve(strvar) , { keep(classes) | char(chars) | omit(chars) } 
    selects characters from strvar according to a specified criterion 
    and generates a new string variable containing only those
characters. 
    This may be done in three ways. First, characters are classified
using
    the keywords alphabetic (any of a-z or A-Z), numeric (any of 0-9), 
    space or other. keep() specifies one or more of those classes: 
    keywords may be abbreviated by as little as one letter. Thus keep(a
n) 
    selects alphabetic and numeric characters and omits spaces and other

    characters. Note that keywords must be separated by spaces.
Alternatively, 
    char() specifies each character to be selected or omit() specifies
each
    character to be omitted. Thus char(0123456789.) selects numeric 
    characters and the stop (presumably as decimal point); omit(" ")
strips 
    spaces and omit(`"""') strips double quotes. (Stata 7 required.) 

Hope that helps.
Jen


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Mario Macis
Sent: Monday, February 27, 2006 1:44 PM
To: [email protected]
Subject: st: help cleaning string variable


Dear statalist users,
I need to clean a string variable containing the names of a large number
of firms (over 30,000). In many cases these names contain extra
characters that I would like to eliminate, such as % or " or ^. These
characters always come at the beginning of the name. I know that Stata
has a command (trim) that eliminates leading and trailing blank spaces
from string variables. Is there a similar command to eliminate leading
"undesired" characters? Thank you so much for your help. Best, Mario

--
Mario Macis
PhD Candidate
Department of Economics
University of Chicago
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index