Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data management issue (names listed differently)


From   "Eva Poen" <eva.poen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: data management issue (names listed differently)
Date   Wed, 2 Jul 2008 16:42:52 +0100

Rufus,

are there too many schools/spellings to do it manually (i.e. -replace
school = "USC" if inlist(school, "Southern Cal","SouthCal")- )?

In any case, I would recommend that you clean up your school variable
to make your task as easy as possible. That includes stripping of
leading/trailling blanks using -trim()-, and converting everything to
lower case (-lower()-). -itrim()- will reduce multiple, consecutive
internal blanks to one for you. All of this will help in reducing the
number of replacements you have to do.

As a general strategy, you could compile a list (or data set) of all
the spellings you have, after cleaning up. If you go for a data set,
it could have two variables, CorrectSpelling and WrongSpelling. It
should then be possible to use -merge- to add the correct spelling to
data sets where the wrong spelling is present. For this to work you
need to make sure that there are no ambiguous wrong spellings, i.e.
abbreviations that may relate to more than one school.

Hope this helps,
Eva




2008/7/2 Rufus Peabody <rufus.peabody@gmail.com>:
> Hey all,
>
> I'm working with a dataset that contains a few variable containing the name
> of different college football teams.  The problem is, they are not spelled
> consistently (i.e. Miami(FL) and Miami Florida; USC and Southern Cal).  In
> many cases the spelling differs only in that there is an extra space after
> the school name for some.  What I'd like to do (and I'm pretty sure is
> possible) is create a master file with all the school names and possible
> spellings, which I can then somehow merge with my original dataset (and any
> future datasets with these teams) to create a consistent spelling.  How do I
> go about doing this? Specifically, if I have, say three variables containing
> spelling 1, spelling 2, and spelling 3 of a school, and I want to use
> spelling 1 in another dataset, how can I merge with a variable that has some
> schools with spellling 1 and others with spelling 2 or 3?
>
> Thanks a lot,
> Rufus
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index