Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: working with a 24-character string variable consisting of 0s and 1s


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: working with a 24-character string variable consisting of 0s and 1s
Date   Tue, 11 Feb 2014 13:31:10 +0000

There is a better way in this case, as removing "0"s is the complement
of removing "1"s:

gen firstyear = length(subinstr(substr(myvar,1,12), "0", "", .))

The more general trick remains to count what you want by removing
instances and seeing what difference that makes to the length. As
here, don't remove it in the original variable, but just get Stata to
do the same calculation.

Nick
[email protected]


On 11 February 2014 10:12, Nick Cox <[email protected]> wrote:
> Regular expressions are great, just too often considered when there
> are more direct methods of getting what you want.
>
> Consider
>
> gen firstyear = 12  - length(subinstr(substr(myvar,1,12), "1", "", .))
>
> Let's split the recipe into steps:
>
> substr(myvar, 1, 12) is the first 12 characters.
>
> subinstr(substr(myvar, 1, 12), "1", "", .)
>
> blanks out each "1", replacing it with "", an empty string.
>
> length() gives you the length of what's left. 12 minus that is the
> length of what we removed, and so the number of 1s in the substring.
>
> The second year is then
>
> gen secondyear = 12  - length(subinstr(substr(myvar,13,12), "1", "", .))
>
> Once understood, the flavour is "Yes, of course", but it was spelled out within
>
> http://www.stata-journal.com/article.html?article=dm0056
>
> Nick
> [email protected]
>
>
> On 11 February 2014 02:46, Lisa Cook <[email protected]> wrote:
>> Hi,
>>
>> I need help working with a cumbersome string variable. I'm using Stata/MP 13.0.
>>
>> I've inherited a dataset that includes several variables indicating
>> the number of months each person had specific kinds of health
>> insurance (Medicaid, Medicare, private, etc.).
>>
>> The variables are 24 characters long in string format. Each character
>> is either a 0 or 1, and represents whether the person had coverage in
>> that month. So, if one of these variables equals
>> "000000000000000000000000", the person had no coverage in any month of
>> that type, while if it equals "111111111111111111111111", they were
>> covered in every month by that kind of insurance. If the variable
>> equals, say, "101111111111111111111111", the person had 23 months of
>> coverage, but no coverage in the 2nd month.
>>
>> I would like to use these variables to generate, for each kind of
>> insurance, the total in year 1, the total in year 2, and the total
>> number of months of coverage in both years.
>>
>> I've used regexm before, but I can't figure out how to apply that code
>> to my situation. I'd be very grateful if anyone could suggest some
>> options.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index