Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: cleaning data efficiently |

Date |
Fri, 28 Oct 2011 16:24:43 +0100 |

1. -egen- has a -mode()- function. egen mode = mode(regionname), by(regioncode) 2. For that you need something like egen tag = tag(mode regioncode) egen ndistinctvalues = total(tag), by(mode) See also for a review SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton Q4/08 SJ 8(4):557--568 shows how to answer questions about distinct observations from first principles; provides a convenience command 3. ssc inst groups help groups groups regionname regioncode (there are other ways, but I like this one) Nick n.j.cox@durham.ac.uk Vitorino, Maria Ana Suppose I have the following data: regioncode regionname X AAA Y BBB Z CCC X . X AAA Y BBB Z . Z AAA Z CCC Z CCC Assume also that the regioncode variable is correct but there are some errors and missing values in the regionname variable. 1) Is there an efficient way to fix the entries in the regionname variable? (For this we need to assume that the correspondence between regioncode and regioname that occurs more frequently is the correct one.) I usually deal with this type of issues using several lines of code so I'm wondering if there is a more efficient way making use of some stata commands that I'm not familiar with. Also, if, after correcting the mistakes, I want to 2)check if the correspondence between the two variables is unique 3) create a table with regionname regioncode and frequency of observations (but not a two-way table) What is the most efficient way? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: cleaning data efficiently***From:*Maria Ana Vitorino <vitorino@wharton.upenn.edu>

**References**:**st: cleaning data efficiently***From:*"Vitorino, Maria Ana" <vitorino@wharton.upenn.edu>

- Prev by Date:
**st: Convert Geoda .GWT to stata format** - Next by Date:
**RE: st: Saving multiple graphs** - Previous by thread:
**st: cleaning data efficiently** - Next by thread:
**Re: st: RE: cleaning data efficiently** - Index(es):