Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Case conversion in stringvars (and 2 cmds with same name)


From   "Nick Winter" <nwinter@policystudies.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Case conversion in stringvars (and 2 cmds with same name)
Date   Tue, 17 Sep 2002 15:32:33 -0400

> Dear all,
>  
> I have corporate announcements data where records of a string 
> var are inconsistently capitalised, which makes analysis 
> difficult. To illustrate, the following records should really 
> appear as one and the same:
> 
> Annual Report and Accounts 
> Annual Reports and Accounts 
> Annual report and accounts 
> Annual reports and Accounts
> Annual reports and accounts 
> 
> I can globally replace 'reports' by 'report' in a text editor 
> but would like to automate case conversion in Stata if I 
> could - say, convert everything to lower case (or just the 
> first word capitalised, or every word in the header). 
> -renvars- does this but for variable names, not observations. 
> -mixcase-, on the other hand, seems just the job, but it was 
> written by Bill Gould nearly ten years ago (it was part of 
> dm13.1) so I doubt it would run smoothly three or four 
> versions of Stata later. Is anyone aware of an update to this 
> routine or an alternative way of doing the same?

First, a note:  in fact, Stata's version control generally ensures that
antique programs continue to function correctly.  If you look at the
contents of mixcase.ado, you will see that the first line is "version
3.0"; so -mixcase- should work just fine.  In my quick tests, it seems
to work OK for me.

But, there is an easier way -- the lower() function.  That turns a
string into all lower case.  So you could just:

	replace <varname>=lower(<varname>)

If you have a lot of variables:

	for var <varlist> : replace X = lower(X)

or

	foreach v in <varlist> {
		replace `v' = lower(`v')
	}

That deals with the capitalization.  For the "reports" and "report" and
the like,
I would use the subinstr() function.  (I think that stands for
SUBstitute IN STRing):

	replace <varname> = subinstr(<varname>,"reports","report",.)
	replace <varname> = subinstr(<varname>,"reports","report",.)

These can be similarly wrapped up in the -for- or -foreach- loops.

--Nick Winter
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index