Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: extracting substrings from string, with irregular patterns

From   Fernando Luco <>
Subject   st: extracting substrings from string, with irregular patterns
Date   Thu, 16 Aug 2012 13:27:53 -0500


I have a dataset with one variable that contains the name of a gas
station, the address and the city in which the station is located. I
would like to separate all these in three different variables, name,
address and city. I have tried to use the regexs machinery but I
haven't been succesful. The data looks as follows

COPEC AV. 11 DE SEPTIEMBRE 000,Tocopilla
PETROBRAS Av. Antonio Rendic 6850,Antofagasta
TERPEL Basilio Urrutia esq. Janequeo 312,Lautaro
Sin Bandera carrera 348,Lautaro
Sin Bandera Isabel Riquielme 403,Villarrica

In the example the names are COPEC, PETROBRAS, TERPEL and Sin Bandera,
so there is a mixture of only uppercase and lowercase letters. The
addreses are written as: AV. 11 DE SEPTIEMBRE 000, Av. Antonio Rendic
6850, Basilio Urrutia esq Janequeo 312, carrera 348 and Isabel
Riquielme 403. Finally, the city is what follows the comma, so
Tocopilla, Antofagasta, Lautaro and Villarrica.

What I would like to do, even if it requires several steps, is to have
the name, address and city each as a different variable. I have tried
to separate everything by sub strings by spaces but it didn't work. I
also tried first recovering names in uppercase letters but it also
didn't work.

Finally, I have 1,600 stations so I would like to avoid doing this one
by one. Any suggestions?


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index