Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Lisa Cook <hlthsrvcsphd@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: working with a 24-character string variable consisting of 0s and 1s |
Date | Sat, 15 Feb 2014 14:27:11 -0500 |
Apologies for the delayed reply. Thanks very much to Nick and Eduardo for the assist! On Tue, Feb 11, 2014 at 8:31 AM, Nick Cox <njcoxstata@gmail.com> wrote: > There is a better way in this case, as removing "0"s is the complement > of removing "1"s: > > gen firstyear = length(subinstr(substr(myvar,1,12), "0", "", .)) > > The more general trick remains to count what you want by removing > instances and seeing what difference that makes to the length. As > here, don't remove it in the original variable, but just get Stata to > do the same calculation. > > Nick > njcoxstata@gmail.com > > > On 11 February 2014 10:12, Nick Cox <njcoxstata@gmail.com> wrote: >> Regular expressions are great, just too often considered when there >> are more direct methods of getting what you want. >> >> Consider >> >> gen firstyear = 12 - length(subinstr(substr(myvar,1,12), "1", "", .)) >> >> Let's split the recipe into steps: >> >> substr(myvar, 1, 12) is the first 12 characters. >> >> subinstr(substr(myvar, 1, 12), "1", "", .) >> >> blanks out each "1", replacing it with "", an empty string. >> >> length() gives you the length of what's left. 12 minus that is the >> length of what we removed, and so the number of 1s in the substring. >> >> The second year is then >> >> gen secondyear = 12 - length(subinstr(substr(myvar,13,12), "1", "", .)) >> >> Once understood, the flavour is "Yes, of course", but it was spelled out within >> >> http://www.stata-journal.com/article.html?article=dm0056 >> >> Nick >> njcoxstata@gmail.com >> >> >> On 11 February 2014 02:46, Lisa Cook <hlthsrvcsphd@gmail.com> wrote: >>> Hi, >>> >>> I need help working with a cumbersome string variable. I'm using Stata/MP 13.0. >>> >>> I've inherited a dataset that includes several variables indicating >>> the number of months each person had specific kinds of health >>> insurance (Medicaid, Medicare, private, etc.). >>> >>> The variables are 24 characters long in string format. Each character >>> is either a 0 or 1, and represents whether the person had coverage in >>> that month. So, if one of these variables equals >>> "000000000000000000000000", the person had no coverage in any month of >>> that type, while if it equals "111111111111111111111111", they were >>> covered in every month by that kind of insurance. If the variable >>> equals, say, "101111111111111111111111", the person had 23 months of >>> coverage, but no coverage in the 2nd month. >>> >>> I would like to use these variables to generate, for each kind of >>> insurance, the total in year 1, the total in year 2, and the total >>> number of months of coverage in both years. >>> >>> I've used regexm before, but I can't figure out how to apply that code >>> to my situation. I'd be very grateful if anyone could suggest some >>> options. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/