Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: -word()- with non space separator


From   Jeph Herrin <jeph.herrin@yale.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: -word()- with non space separator
Date   Wed, 23 Sep 2009 20:58:01 -0400

You are right. Thing is, I know the maximum - the number of
possible values is contained in information that is attached
to each variable. So my method and yours work fine. So my
objection was totally unjustified.

Jeph



Nick Cox wrote:
Not knowing the highest value in advance would bite equally hard with
the method in your previous post, which works from 1 upwards to a
specified maximum, so that objection seems unconvincing to me. Nick n.j.cox@durham.ac.uk
Jeph Herrin

Thanks. I also thought of something like this, but
didn't want to pursue it, if that makes sense. For
one thing, I have literally thousands of variables and
don't know ahead of time what the highest number I
need is.

As for the structure, it may not be the worst, but it
is surely not the best.

Nick Cox wrote:

Another way to do it: clonevar work = myvar qui forval i = 29(-1)1 { gen myvar_`i' = strpos(work, "`i'") > 0 replace work = subinstr(work, "`i'", "", .) } Here 29 is in general whatever highest number you need. In words, in addition to the -strpos()- logic, 1. Work on a copy, because we're going to change it. 2. Work downwards, from high values down to 1.
3. Once you've checked for a longer string, zap it so that it doesn't
later confuse the search for shorter strings.
Incidentally, don't knock the format (or structure). When Uli Kohler
and
I wrote up the tricks we knew for multiple responses (in this sense),
it
was pretty clear to us that all such formats or structures have some
big
advantages and disadvantages. Our efforts are accessible at
FAQ     . . . . . . . . . . . . . . . . . . .  Dealing with multiple
responses
        . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox and
U.
Kohler
        4/05    How do I deal with multiple responses?
                http://www.stata.com/support/faqs/data/multresp.html

SJ-3-1  pr0008   Speaking Stata: On structure & shape: the case of
mult.
resp.
        . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox &
U.
Kohler
        Q1/03   SJ 3(1):81--99                                   (no
commands)
        discussion of data manipulations for multiple response data

Nick n.j.cox@durham.ac.uk
Jeph Herrin

Solved - this does it:

     forv i=1/9 {
          gen byte myvar_`i'= regexm(myvar,"^`i':|:`i':|:`i'$")
     }


Jeph Herrin wrote:

I have a dataset in which many variables are in
the most useless format imaginable. If a question
has multiple checkboxes as possible answers, the
response is stored as a string, with a number indicating
each box checked and these numbers separated by colons.
Thus:

                myvar
      1:2:3:5:6:7:8:9
              1:2:3:6
      1:2:3:4:5:7:8:9
          1:2:3:5:7:9
        1:2:3:5:7:8:9
            2:3:4:6:9
      1:2:3:5:6:7:8:9
            1:2:7:8:9
                  7:9

This variable takes 9 values, so I want to split into 9
different indicator variables, myvar_1-myvar_9, each
indicating whether that number was selected. -split()-
does not work, because of the differing number of values
per string. That is, it produces myvar_1 which equals "7"
for the last obs.

So I am looking for a way to check whether a given string
contains a given integer, which would allow me to

   forv i=1/9 {
    gen byte myvar_`i'= [`i' is in myvar list]
   }

As long as there are just 9 values, I can use -strpos()-
to check for the presence of the digit, but some of my variables
run into tens and twenties, in which case eg searching for "1"
returns true even if there is only "11".

The only solutions I see are to first -split()- and
then check all the new indicators, or run through a series of
checks such as (matches "1:" but not ":1").  I don't like
either: Is there a direct way to check to see if a given integer
is in the list?

I think there may be a regex solution, but my Perl programming
days are so far behind me that I've not been able to come up
with one.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index