Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data management issue (names listed differently)


From   Rufus Peabody <rufus.peabody@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: data management issue (names listed differently)
Date   Wed, 2 Jul 2008 09:09:44 -0700

Eva,

Much thanks for the advice. I am still wondering how I can merge with a variable that has a mixture of CorrectSpelling and WrongSpelling. Cleaning it up manually is extremely time-consuming since there are thousands of observations.

Thanks,
Rufus

On Jul 2, 2008, at 8:42 AM, Eva Poen wrote:


Rufus,

are there too many schools/spellings to do it manually (i.e. -replace
school = "USC" if inlist(school, "Southern Cal","SouthCal")- )?

In any case, I would recommend that you clean up your school variable
to make your task as easy as possible. That includes stripping of
leading/trailling blanks using -trim()-, and converting everything to
lower case (-lower()-). -itrim()- will reduce multiple, consecutive
internal blanks to one for you. All of this will help in reducing the
number of replacements you have to do.

As a general strategy, you could compile a list (or data set) of all
the spellings you have, after cleaning up. If you go for a data set,
it could have two variables, CorrectSpelling and WrongSpelling. It
should then be possible to use -merge- to add the correct spelling to
data sets where the wrong spelling is present. For this to work you
need to make sure that there are no ambiguous wrong spellings, i.e.
abbreviations that may relate to more than one school.

Hope this helps,
Eva




2008/7/2 Rufus Peabody <rufus.peabody@gmail.com>:
Hey all,

I'm working with a dataset that contains a few variable containing the name
of different college football teams. The problem is, they are not spelled
consistently (i.e. Miami(FL) and Miami Florida; USC and Southern Cal). In
many cases the spelling differs only in that there is an extra space after
the school name for some. What I'd like to do (and I'm pretty sure is
possible) is create a master file with all the school names and possible
spellings, which I can then somehow merge with my original dataset (and any
future datasets with these teams) to create a consistent spelling. How do I
go about doing this? Specifically, if I have, say three variables containing
spelling 1, spelling 2, and spelling 3 of a school, and I want to use
spelling 1 in another dataset, how can I merge with a variable that has some
schools with spellling 1 and others with spelling 2 or 3?

Thanks a lot,
Rufus
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index