Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: cleaning data efficiently


From   "Vitorino, Maria Ana" <vitorino@wharton.upenn.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: cleaning data efficiently
Date   Fri, 28 Oct 2011 15:05:02 +0000

Dear statalist users,
Suppose I have the following data:

regioncode  regionname
X  AAA
Y BBB
Z CCC
X   .
X AAA
Y BBB
Z   .
Z AAA
Z CCC
Z CCC

Assume also that the regioncode variable is correct but there are some errors and missing values in the regionname variable.
1) Is there an efficient way to fix the entries in the regionname variable? (For this we need to assume that the correspondence between regioncode and regioname that occurs more frequently is the correct one.)

I usually deal with this type of issues using several lines of code so I'm wondering if there is a more efficient way making use of some stata commands that I'm not familiar with.

Also, if, after correcting the mistakes, I want to 
2)check if the correspondence between the two variables is unique 
3) create a table with regionname regioncode and frequency of observations (but not a two-way table)
What is the most efficient way?

Thanks!
Ana
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index