Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Combine uppercase and lowercase text


From   "Sebastian F. Büchte" <sfbuechte@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Combine uppercase and lowercase text
Date   Thu, 22 Feb 2007 18:10:55 +0100

On 2/22/07, Nick Cox <n.j.cox@durham.ac.uk> wrote:
Sebastian's approach is mine too, but it can be
done a little more directly.
Nick,

thank you for the hin. Just looking at my proposed solution I realize
that I introduced a temporary variable textgroup for actually no
reasons since I am not using it at all...

Regards
sebastian

> clear
> gen str15 text = ""
> input
>  "some text"
>  "Some Text"
>  "SOME TEXT"
>  "some other text"
>  "some other text"
>  "Some other text"
>  "Some other text"
>  "SoMe TeXt"
>  "SoMe TeXt"
>  "Some Other Text"
> end
> tempvar lotext
> tempvar textgrp
> tempvar comspelling
>
> gen `lotext'=lower(text)
> bys `lotext': gen `textgrp'=1 if _n==1
> replace `textgrp'=sum(`textgrp')
>
> bys `lotext' text: gen `comspelling'=_N
> bys `lotext' `comspelling': gen newtext=text[_N]
>
> I bet there are more elegant ways out in the wild and I am just
> looking forward to learn about them.
>
> Regards
> Sebastian
>
>
> On 2/22/07, Friedrich Huebler <huebler@rocketmail.com> wrote:
> > My data has string variables with text in uppercase or lowercase
> > letters. I would like to replace observations that are
> identical once
> > capitalization is ignored (e.g., "TEXT" and "text") by the most
> > common spelling. In some cases there are ties. So far I have only
> > managed to replace all such observations by their lowercase variant,
> > as in the example below. I am stumped and would appreciate
> any advice
> > on how I should proceed. I use Stata 8.2.
> >
> > Friedrich Huebler
> >
> > clear
> > gen str15 text = ""
> > input
> >  "some text"
> >  "Some Text"
> >  "SOME TEXT"
> >  "some other text"
> >  "some other text"
> >  "Some other text"
> >  "Some other text"
> >  "SoMe TeXt"
> >  "SoMe TeXt"
> >  "Some Other Text"
> > end
> > count
> > local n = r(N)
> > forvalues i = 1/`n' {
> >  local t = lower(text[`i'])
> >  replace text = "`t'" if lower(text) == "`t'"
> > }
> >

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index