[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: data management - string function |

Date |
Wed, 24 Dec 2008 14:53:08 -0500 |

-Steve **************************CODE BEGINS************************** clear drop _all input str40 name "Mr John Smith" "Mr. John Jones" " Mr Donald Trump" "Mrs. Felicia Mroz" "Mrumph Caliph" " Dr. Tom Lester " "drummond katz" "John Amro" "Mr.Tim Donner" "Mister D.D. Smith" "Doctor Nicholas J. Cox" "Ms. Virginia Wolfe" end gen namex = trim(lower(name)) #delim ;

|(^mister)|(^doctor) |(^ms(\.| ))","")); #delim cr list name name_only ***************************CODE ENDS*************************** On Dec 24, 2008, at 11:48 AM, Howard Lempel wrote:

Hi all,Following from Sergiy's advice, I'd like to suggest that bw useregular expressions to only delete occurrences of Mr, Dr, etc. thatoccur at the beginning of a name. This should save Dr. Mroz (orsomeone with last name Mr) from being deleted. Someone with afirst name of MR will still be in trouble (you may want toexperiment with finding a way to only deleting titles from peoplewhere var1 is at least three words, saving someone with first nameMR and no title in the data).I don't have time to write out the full code, but see the regularexpression FAQ here: http://www.stata.com/support/faqs/data/regex.htmlAlso look up -help regexm-BW, carrot (^) tells Stata you are searching for characters at thebeginning of a string only, so you probably want something to theeffect of:Gen var2 = regexr(var1,^("MR" | "MR." | "Mr" | . . .),)Note: That code is untested, unfinished, and written by someone w/oexpertise on regular expressions (e.g. I'd need to look up exactlyhow the "OR" operator and parentheses work).Hope this helps. Howie -----Original Message-----From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy RadyakinSent: Wednesday, December 24, 2008 11:30 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: data management - string function Hi, I just hope that this program will not manage banking accounts, otherwize someone like Dr Mroz(http://www.unc.edu/~mroz/index_files/vita_mroz_2007_August%5B2%5D.pdf)will loose all his savings. The program should be very careful about replacing the combinations of letters. When there is no guarantee, that "Mr." is always spelled with a dot (like in the original data sample in the first email in this thread) spaces should be incorporated, but even then there is no way you can be sure that Mr is not a lastname. E.g. the common Asian last name "Ng" (e.g. http://www.drdavidng.com/contact_us.html) would not qualify many naive validators (very short, no vowels). Perhaps in some languages "Mr" is also a name, lastname or a middle name. Also the choice of titles should probably be wider, to allow e.g. for Dr., Prof., Col., or any combination of these (which can occur in multiple combinations like "The life and activities of Col. Prof. Dr. Jezdimir STUDIC" here:http://www.ncbi.nlm.nih.gov/pubmed/14447887?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=1&log$=relatedarticles&logdbfrom=pubmed)Some of the titles are listed here: http://ecs.victoria.ac.nz/Groups/AI/TitleGeneratorTitles but more extensive lists can be found in the internet. Careless replacing of "Master", "Marquis" or "Baron" might leave some of the people in your list without a lastname. The only way to be sure about the title is to ask for it separately while collecting the data. Best regards, Sergiy RadyakinOn Wed, Dec 24, 2008 at 3:37 AM, Ashim Kapoor<ashimkapoor@gmail.com> wrote:About your 2nd query. Step 1 : gen gender = word(var1,1) Then do replace gender="F" if gender=="Mrs" replace gender="F" if gender=="Ms" replace gender="M" if gender=="Mr" replace gender="M" if gender=="Mrs" Trouble , what if you have Mr. ( notice the dot ) in place of Mr So we do replace gender="F" if gender=="Mrs." replace gender="F" if gender=="Ms." replace gender="M" if gender=="Mr." I think this should do it. Merry Xmas to you. Ashim.On Wed, Dec 24, 2008 at 2:03 PM, Ashim Kapoor<ashimkapoor@gmail.com> wrote:Hello! I think you want to do this :-- gen j=var1 gen j2=subinstr(j,"Mrs","",1) gen j3=subinstr(j2,"Mr","",1) gen j4=susinstr(j3,"Ms","",1) Note the order of j2 and j3 , it is needed because we have Mr as as subsitring of Mrs. It would be ruined if you did it the other way. I hope you liked it. Thank you, Ashim.On Wed, Dec 24, 2008 at 1:22 PM, b. water<barleywater@hotmail.com> wrote:dear all, stata 8.2, windows xp,i have a data management problem: have a variable (strings) ofnames like these:var1 Mrs A Jones Mrs Anne Jones Ms Abra Ham Mr Ko Jack Jack Kroll No Probs Ms. Abra Ham Mr. Ko Jack . <- denotes missing . . Miss. Wonder Full Mrs Bond Traderi want to generate new variable which removed the person'stitle, so it appear like these:var2 A Jones Anne Jones Abra Ham Ko Jack Jack Kroll No Probs Abra Ham Ko Jack . <- denotes missing . . Wonder Full Bond Traderi tried (thinking that i would slowly truncate Mr, Mrs, Ms titleby title):gen var2=var1replace var2=subinstr("Mr","Mr","",.) <- just as well igenerate var2 as this command wiped out all the names!i want to also generate another variable that will assign genderbased on the title of the name in var 1 i.e. if Mr or Mr. then M(ale) and if Mrs, Mrs., Ms, Ms., Miss, Miss. then F(emale). ithought generate/replace or replace/if using string functionswould help but i think this require loop of a sort to achieve.F F F M . . F M . . . F F thank for advice/help. season's greetings, bw _________________________________________________________________ It's the same Hotmail(R). If by "same" you mean up to 70% faster.http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_broad1_122008* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: data management - string function***From:*"b. water" <barleywater@hotmail.com>

**Re: st: data management - string function***From:*"Ashim Kapoor" <ashimkapoor@gmail.com>

**Re: st: data management - string function***From:*"Ashim Kapoor" <ashimkapoor@gmail.com>

**Re: st: data management - string function***From:*"Sergiy Radyakin" <serjradyakin@gmail.com>

**RE: st: data management - string function***From:*Howard Lempel <HLempel@brookings.edu>

- Prev by Date:
**Re: st: Extracting Numbers from Strings** - Next by Date:
**Re: st: Extracting Numbers from Strings** - Previous by thread:
**RE: st: data management - string function** - Next by thread:
**Re: st: data management - string function** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |