Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Removing quotation marks in string variables


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Removing quotation marks in string variables
Date   Fri, 5 Jul 2013 18:40:46 +0100

What I recommend

1. Replace instances of

","

with

"  "

i.e. replace the commas between fields with spaces, leaving commas
within fields untouched.

A pattern here is

. gen FOO = subinstr(foo, char(34) + "," + char(34), char(34) + " " +
char(34), .)

2. Extract what is left as "words" of the string. See the functions
-wordcount()- and word()-, noting Stata's rule that " " bind harder
than spaces separate, i.e. "Stata rules OK", "Queen rules UK" are
words to Stata.

Nick
njcoxstata@gmail.com


On 5 July 2013 18:24, Nick Cox <njcoxstata@gmail.com> wrote:
> The logic is this.
>
> 1. You want to look for " as a literal character, not a delimiter.
>
> 2. Therefore compound double quotes `" "' must be used as delimiters.
>
> replace tags = subinstr(tags, `"""',  "", .)
>
> The awkward argument is
>
> `"    left compound double quote
> "    double quote to be taken literally
> "'   right compound double quote
>
> A trick to avoid all this is to note that  " is char(34)  [see
> -asciiplot- from SSC for a cheat sheet] so that
>
> replace tags = subinstr(tags, char(34), "", .)
>
> But hang on: "Jackson, Collin" is recognisably one field: if you strip
> the " you will lose that. So, this may not be your best strategy.
>
> Note also -split-.
>
>
> Nick
> njcoxstata@gmail.com
>
>
> On 5 July 2013 17:57, John Adam Roberts <roberts.john.adam@gmail.com> wrote:
>> Hi,
>>
>> I have a dataset that is full of strings that look like this:
>>
>> ATUS00001213, "ATUS00001256","Jackson,Collin"
>>
>> I'm going to separate these values into different variables and I'm
>> wondering how to remove the "s from the dataset.
>>
>> For other characters, like commas, I would use the following command:
>> replace tags = subinstr(tags,",","",.)
>>
>> but
>> replace tags = subinstr(tags,""","",.)
>> replace tags = subinstr(tags,'"',"",.)
>> replace tags = subinstr(tags,""",.,.)
>> replace tags = subinstr(tags,'"',.,.)
>> don't work.
>>
>> Thanks in advance for the help!
>>
>> -Adam
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index