Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: working with a 24-character string variable consisting of 0s and 1s

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: working with a 24-character string variable consisting of 0s and 1s Date Tue, 11 Feb 2014 13:31:10 +0000

```There is a better way in this case, as removing "0"s is the complement
of removing "1"s:

gen firstyear = length(subinstr(substr(myvar,1,12), "0", "", .))

The more general trick remains to count what you want by removing
instances and seeing what difference that makes to the length. As
here, don't remove it in the original variable, but just get Stata to
do the same calculation.

Nick
njcoxstata@gmail.com

On 11 February 2014 10:12, Nick Cox <njcoxstata@gmail.com> wrote:
> Regular expressions are great, just too often considered when there
> are more direct methods of getting what you want.
>
> Consider
>
> gen firstyear = 12  - length(subinstr(substr(myvar,1,12), "1", "", .))
>
> Let's split the recipe into steps:
>
> substr(myvar, 1, 12) is the first 12 characters.
>
> subinstr(substr(myvar, 1, 12), "1", "", .)
>
> blanks out each "1", replacing it with "", an empty string.
>
> length() gives you the length of what's left. 12 minus that is the
> length of what we removed, and so the number of 1s in the substring.
>
> The second year is then
>
> gen secondyear = 12  - length(subinstr(substr(myvar,13,12), "1", "", .))
>
> Once understood, the flavour is "Yes, of course", but it was spelled out within
>
> http://www.stata-journal.com/article.html?article=dm0056
>
> Nick
> njcoxstata@gmail.com
>
>
> On 11 February 2014 02:46, Lisa Cook <hlthsrvcsphd@gmail.com> wrote:
>> Hi,
>>
>> I need help working with a cumbersome string variable. I'm using Stata/MP 13.0.
>>
>> I've inherited a dataset that includes several variables indicating
>> the number of months each person had specific kinds of health
>> insurance (Medicaid, Medicare, private, etc.).
>>
>> The variables are 24 characters long in string format. Each character
>> is either a 0 or 1, and represents whether the person had coverage in
>> that month. So, if one of these variables equals
>> "000000000000000000000000", the person had no coverage in any month of
>> that type, while if it equals "111111111111111111111111", they were
>> covered in every month by that kind of insurance. If the variable
>> equals, say, "101111111111111111111111", the person had 23 months of
>> coverage, but no coverage in the 2nd month.
>>
>> I would like to use these variables to generate, for each kind of
>> insurance, the total in year 1, the total in year 2, and the total
>> number of months of coverage in both years.
>>
>> I've used regexm before, but I can't figure out how to apply that code
>> to my situation. I'd be very grateful if anyone could suggest some
>> options.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```