Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: -word()- with non space separator


From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   RE: st: -word()- with non space separator
Date   Wed, 23 Sep 2009 19:46:17 +0200

<>

I posted code that knows the maximum two hours ago...


HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Mittwoch, 23. September 2009 19:43
To: [email protected]
Subject: RE: st: -word()- with non space separator

Not knowing the highest value in advance would bite equally hard with
the method in your previous post, which works from 1 upwards to a
specified maximum, so that objection seems unconvincing to me. 

Nick 
[email protected] 

Jeph Herrin

Thanks. I also thought of something like this, but
didn't want to pursue it, if that makes sense. For
one thing, I have literally thousands of variables and
don't know ahead of time what the highest number I
need is.

As for the structure, it may not be the worst, but it
is surely not the best.

Nick Cox wrote:

> Another way to do it: 
> 
> clonevar work = myvar 
> 
> qui forval i = 29(-1)1 { 
> 	gen myvar_`i' = strpos(work, "`i'") > 0 
> 	replace work = subinstr(work, "`i'", "", .) 
> } 
> 
> Here 29 is in general whatever highest number you need. 
> 
> In words, in addition to the -strpos()- logic, 
> 
> 1. Work on a copy, because we're going to change it. 
> 
> 2. Work downwards, from high values down to 1. 
> 
> 3. Once you've checked for a longer string, zap it so that it doesn't
> later confuse the search for shorter strings. 
> 
> Incidentally, don't knock the format (or structure). When Uli Kohler
and
> I wrote up the tricks we knew for multiple responses (in this sense),
it
> was pretty clear to us that all such formats or structures have some
big
> advantages and disadvantages. Our efforts are accessible at 
> 
> FAQ     . . . . . . . . . . . . . . . . . . .  Dealing with multiple
> responses
>         . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox and
U.
> Kohler
>         4/05    How do I deal with multiple responses?
>                 http://www.stata.com/support/faqs/data/multresp.html
> 
> SJ-3-1  pr0008   Speaking Stata: On structure & shape: the case of
mult.
> resp.
>         . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox &
U.
> Kohler
>         Q1/03   SJ 3(1):81--99                                   (no
> commands)
>         discussion of data manipulations for multiple response data
> 
> Nick 
> [email protected] 
> 
> Jeph Herrin
> 
> Solved - this does it:
> 
>      forv i=1/9 {
>           gen byte myvar_`i'= regexm(myvar,"^`i':|:`i':|:`i'$")
>      }
> 
> 
> Jeph Herrin wrote:
> 
>> I have a dataset in which many variables are in
>> the most useless format imaginable. If a question
>> has multiple checkboxes as possible answers, the
>> response is stored as a string, with a number indicating
>> each box checked and these numbers separated by colons.
>> Thus:
>>
>>                 myvar
>>       1:2:3:5:6:7:8:9
>>               1:2:3:6
>>       1:2:3:4:5:7:8:9
>>           1:2:3:5:7:9
>>         1:2:3:5:7:8:9
>>             2:3:4:6:9
>>       1:2:3:5:6:7:8:9
>>             1:2:7:8:9
>>                   7:9
>>
>> This variable takes 9 values, so I want to split into 9
>> different indicator variables, myvar_1-myvar_9, each
>> indicating whether that number was selected. -split()-
>> does not work, because of the differing number of values
>> per string. That is, it produces myvar_1 which equals "7"
>> for the last obs.
>>
>> So I am looking for a way to check whether a given string
>> contains a given integer, which would allow me to
>>
>>    forv i=1/9 {
>>     gen byte myvar_`i'= [`i' is in myvar list]
>>    }
>>
>> As long as there are just 9 values, I can use -strpos()-
>> to check for the presence of the digit, but some of my variables
>> run into tens and twenties, in which case eg searching for "1"
>> returns true even if there is only "11".
>>
>> The only solutions I see are to first -split()- and
>> then check all the new indicators, or run through a series of
>> checks such as (matches "1:" but not ":1").  I don't like
>> either: Is there a direct way to check to see if a given integer
>> is in the list?
>>
>> I think there may be a regex solution, but my Perl programming
>> days are so far behind me that I've not been able to come up
>> with one.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index