Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: routine for matching of a str-variable

From   Nick Cox <>
Subject   Re: st: routine for matching of a str-variable
Date   Wed, 11 May 2011 18:19:33 +0100

The solution depends on the problem.

The third law of string operations says "Never use regex machinery if
you don't need it.".

Thus the distinct values of your string variable will be given by
-levelsof- and looping over those levels will give you as many
indicators as you need.

Even better, -tab, generate()- will do it for you in one line!

regex machinery would presumably only be needed if you suspected
spelling mistakes and then you would still need to think how much
latitude you need to allow.


On Wed, May 11, 2011 at 5:52 PM, Thomas Zimmermann <> wrote:

> I want to check the prevalence of 200+ pharmaceutical agents in a dataset of
> 14000+ ATC-codes in 3327 patients. The table with the pharmaceutical agents
> is organised this way:
> "pharmaceutical agent (str)" "atc-code (str)"
> i=1
> 2 Memantine N06DX01
> 3 Estron G03CA07
> 4 Promestrien G03CA09
> i=200
> I'm looking for a routine that first creates a new variable "atc-code". this
> var should store the information (1), if the atc-code is matched, (0) if
> it's not.
>  my workaround (if it deserves that name) til now is to copy 200+times, then
> re"submit" the different atc-code by hand, :-(.
> gen byte N06DX01 = regexm(atc-code, "^[N]+[0]+[6]+[D]+[X]+[0]+[1]+")
> label var N06DX01 "Memantine"
> tabulate N06DX01

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index