Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: loop until "0 real changes made"


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: loop until "0 real changes made"
Date   Mon, 29 Jul 2013 23:24:53 +0100

The plus "+" character is taken literally by -subinstr()-, which is
totally unaware of any meaning it has in regular expressions. That is
why I said "if needed" as I have no idea how Turkish characters are
represented.

Nick
[email protected]


On 29 July 2013 23:08, Haluk Vahaboglu <[email protected]> wrote:
> Thank you Sergiy, Nick and Robert,
>
> The code below worked and successfully replaced all Turkish characters in several highly complex str variables at once.
> I do not know why but subinstr do not work with this ("`=char(195)'+`=char(135)'")  type expression.
> Sergiy, please give it to my inexperience in Stata; I could not load a font with Turkish characters and so with the available ones the copy and paste from results window produces awkward  characters in lets say openoffice calc.
> Thanks again
>
>  foreach v of var AD* {
>       local more 1
>       while `more' {
>       clonevar stemp = `v'
>     replace `v'=regexr(`v', "`=char(195)'+`=char(135)'","C")
>     replace `v'=regexr(`v', "`=char(196)'+`=char(176)'","I")
>     replace `v'=regexr(`v', "`=char(195)'+`=char(167)'","c")
>     replace `v'=regexr(`v', "`=char(195)'+`=char(182)'","o")
>     replace `v'=regexr(`v', "`=char(196)'+`=char(177)'","i")
>     replace `v'=regexr(`v', "`=char(196)'+`=char(158)'","G")
>     replace `v'=regexr(`v', "`=char(196)'+`=char(159)'","g")
>     replace `v'=regexr(`v', "`=char(195)'+`=char(156)'","U")
>     replace `v'=regexr(`v', "`=char(195)'+`=char(188)'","u")
>     replace `v'=regexr(`v', "`=char(197)'+`=char(158)'","S")
>     replace `v'=regexr(`v', "`=char(195)'+`=char(150)'","O")
>     replace `v'=regexr(`v', "`=char(197)'+`=char(159)'","s")
>     count if `v' != stemp
>     local more = r(N)
>     drop stemp
>       }
>  }
> list in 1/5
> Haluk Vahaboğlu
>
>
>
>
> ----------------------------------------
>> Date: Mon, 29 Jul 2013 15:42:09 -0400
>> Subject: Re: st: loop until "0 real changes made"
>> From: [email protected]
>> To: [email protected]
>>
>> Perhaps an example will explain why...
>>
>> * --------------- begin example ---------------------------
>> clear
>> set obs 1000000
>>
>> set rms on
>>
>> gen s = string(uniform(),"%21x")
>> clonevar s2 = s
>> gen s3 = s
>>
>> gen `:type s' s4 = s
>>
>> * --------------- end example -----------------------------
>>
>>
>>
>> On Mon, Jul 29, 2013 at 3:34 PM, Sergiy Radyakin <[email protected]> wrote:
>>> On Mon, Jul 29, 2013 at 3:19 PM, Robert Picard <[email protected]> wrote:
>>>> Here's a more complete example of how to continue making substitutions
>>>> until there are no more changes. I'm with Nick on using -clonevar-
>>>> when making an exact copy of a variable, it is faster than -generate-
>>>
>>> Pardon my ignorance, but how is -clonevar- (implemented as an ado
>>> program) possibly faster than -generate- (built-in), if it is using
>>> -generate- inside and on top of that does some other things?? (like
>>> copying labels, formats, etc, which are not necessary for this
>>> exercise).
>>>
>>> From clonevar.ado ( 1.0.1 13oct2004):
>>> gen `type' `newvar' = `varname' `if' `in'
>>>
>>> Sergiy
>>>
>>> .
>>>> Also, avoid -regexr()- in Stata 13, it's slow as molasses.
>>>>
>>>> * --------------- begin example ---------------------------
>>>> clear
>>>> set obs 100000
>>>>
>>>> gen AD1 = string(uniform(),"%21x")
>>>> gen AD2 = string(uniform(),"%21x")
>>>> list in 1/5
>>>>
>>>> foreach v of var AD* {
>>>> local more 1
>>>> while `more' {
>>>> clonevar stemp = `v'
>>>> replace `v' = subinstr(`v',"0X-","X-",.)
>>>> count if `v' != stemp
>>>> local more = r(N)
>>>> drop stemp
>>>> }
>>>> }
>>>> list in 1/5
>>>> * --------------- end example -----------------------------
>>>>
>>>>
>>>> On Mon, Jul 29, 2013 at 12:34 PM, Sergiy Radyakin
>>>> <[email protected]> wrote:
>>>>> Nick's solution with two variables is the most generic approach that
>>>>> is useful in situations where it is difficult to predict if any
>>>>> changes are going to happen as a result of your code. It certainly is
>>>>> going to work here as well (I would only use a tempvar instead of AD2
>>>>> and generate instead of clonevar).
>>>>>
>>>>> However, why would you do this recoding to non-Turkish characters?
>>>>> Stata works with Turkish characters like with any other for which a
>>>>> corresponding ANSI page is available and proper font is installed:
>>>>>
>>>>> http://radyakin.org/statalist/2013072901/turkish.png
>>>>> http://radyakin.org/statalist/2013072901/turkish.do
>>>>>
>>>>> The ANSI page for Turkish is 1254. And I would try e.g.:
>>>>> replace `v'=regexr(`v', "`=char(196)'+`=char(158)'","`=char(208)'")
>>>>> instead of
>>>>> replace `v'=regexr(`v', "`=char(196)'+`=char(158)'","G")
>>>>>
>>>>>
>>>>> Best, Sergiy Radyakin
>>>>>
>>>>> On Mon, Jul 29, 2013 at 10:06 AM, Nick Cox <[email protected]> wrote:
>>>>>> Plus the "+" if needed.
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 29 July 2013 15:05, Nick Cox <[email protected]> wrote:
>>>>>>> One answer is not to use regular expressions here at all. Use
>>>>>>> -subinstr()- with statements like
>>>>>>>
>>>>>>> replace `v' = subinstr(`v', "`=char(195)'`=char(135)'","C", .)
>>>>>>>
>>>>>>> Another answer is to set up a count of changes and stop when you hit zero.
>>>>>>>
>>>>>>> clonevar AD2 = AD
>>>>>>>
>>>>>>> foreach v of var AD {
>>>>>>> replace AD2 = AD
>>>>>>> <work with AD>
>>>>>>> count if AD2 != AD
>>>>>>> if r(N) == 0 continue, break
>>>>>>> }
>>>>>>>
>>>>>>> Nick
>>>>>>> [email protected]
>>>>>>>
>>>>>>> On 29 July 2013 14:48, Haluk Vahaboglu <[email protected]> wrote:
>>>>>>>
>>>>>>>> I am using Stata 12.1 for Linux-64 bit and dealing with Turkish characters in string variables. I convert these Turkish characters (ı, ş, ü etc) to readable equivalents (i, s, u etc). Doing this with the code below:
>>>>>>>>
>>>>>>>> foreach v of var AD {
>>>>>>>> replace `v'=regexr(`v', "`=char(195)'+`=char(135)'","C")
>>>>>>>> replace `v'=regexr(`v', "`=char(196)'+`=char(176)'","I")
>>>>>>>> replace `v'=regexr(`v', "`=char(195)'+`=char(167)'","c")
>>>>>>>> replace `v'=regexr(`v', "`=char(195)'+`=char(182)'","o")
>>>>>>>> replace `v'=regexr(`v', "`=char(196)'+`=char(177)'","i")
>>>>>>>> replace `v'=regexr(`v', "`=char(196)'+`=char(158)'","G")
>>>>>>>> replace `v'=regexr(`v', "`=char(196)'+`=char(159)'","g")
>>>>>>>> replace `v'=regexr(`v', "`=char(195)'+`=char(156)'","U")
>>>>>>>> replace `v'=regexr(`v', "`=char(195)'+`=char(188)'","u")
>>>>>>>> replace `v'=regexr(`v', "`=char(197)'+`=char(158)'","S")
>>>>>>>> replace `v'=regexr(`v', "`=char(195)'+`=char(150)'","O")
>>>>>>>> replace `v'=regexr(`v', "`=char(197)'+`=char(159)'","s")
>>>>>>>> }
>>>>>>>>
>>>>>>>> However, this code cannot accomplish the conversion at the first time. Therefore, I have to do it 5 to 10 times to get a (0 real changes made) message.
>>>>>>>> My question is: can I make this loop run automatically until I get the (0 real changes made) message which indicates that all characters are converted.
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index