Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract parts of a string variable


From   "Roger B. Newson" <r.newson@imperial.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Extract parts of a string variable
Date   Mon, 05 Nov 2012 11:19:58 +0000

You don't say whether it is possible for more than 1 of the pollutant names you are searching for to be present at the same time. The answer to this question affects the correct way to proceed.

However, if it may be possible for 2 or more pollutant strings to be present, then you should define 1 indicator variable for each pollutant. Assuming that your string variable has the name -messy-, your code might read:

foreach PO in co no2 nox o3 pm10 pm25 {;
 gene byte pres_`PO'=strpos(messy,"`PO'")>0;
 lab var pres_`PO' "Presence of `PO'";
 tab pres_`PO', m;
};

This will create 6 indicator variables pres_co, pres_no2, pres_nox, pres_o3, pres_pm10 and pres_pm25. You can then use these variables in further processing.

I hope this helps.

Best wishes

Roger


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 05/11/2012 11:05, Kuma Raj wrote:
I have a messy string variable that contains names of various air
pollutants. The contents and naming is based on the name of the
pollutant, lags, station name,
and different exposure metrics. There is no uniformity or fixed
position of the contents in the variable name.  I am interested to
parse the variable and extract the names of the pollutants if they are
specific strings. How can I do that ?

A sample of the variable is found below and I am interested to extract
the following strings:- co, no2, nox, o3, pm10,  and pm25

L1comeanH10
L1comeanS10
no2T10
L1no2T10
L1noxT10
L1o3maxA10
comeanS10_01
L3o3maxM10
L1o3meanM10
L1o3maxT10
L2pm10T10
L1pm25T10

Thanks in advance
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index