Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: routine for matching of a str-variable

From   Nick Cox <>
Subject   Re: st: routine for matching of a str-variable
Date   Wed, 11 May 2011 18:19:33 +0100

The solution depends on the problem.

The third law of string operations says "Never use regex machinery if
you don't need it.".

Thus the distinct values of your string variable will be given by
-levelsof- and looping over those levels will give you as many
indicators as you need.

Even better, -tab, generate()- will do it for you in one line!

regex machinery would presumably only be needed if you suspected
spelling mistakes and then you would still need to think how much
latitude you need to allow.


On Wed, May 11, 2011 at 5:52 PM, Thomas Zimmermann <> wrote:

> I want to check the prevalence of 200+ pharmaceutical agents in a dataset of
> 14000+ ATC-codes in 3327 patients. The table with the pharmaceutical agents
> is organised this way:
> "pharmaceutical agent (str)" "atc-code (str)"
> i=1
> 2 Memantine N06DX01
> 3 Estron G03CA07
> 4 Promestrien G03CA09
> i=200
> I'm looking for a routine that first creates a new variable "atc-code". this
> var should store the information (1), if the atc-code is matched, (0) if
> it's not.
>  my workaround (if it deserves that name) til now is to copy 200+times, then
> re"submit" the different atc-code by hand, :-(.
> gen byte N06DX01 = regexm(atc-code, "^[N]+[0]+[6]+[D]+[X]+[0]+[1]+")
> label var N06DX01 "Memantine"
> tabulate N06DX01

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index