Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: data formatting question


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: data formatting question
Date   Mon, 13 Feb 2006 22:10:39 -0000

Never say "One more question". What are you 
going to say next time? 

The -duplicates- command is designed to 
deal with duplicates. Duplicates are 
exactly equal to each other. I know this because 
we wrote it. 

What you have is quite different, and -duplicates- is
irrelevant to that. 

You want 

drop if missing(ethnicpop) 
bysort country year ethnicity (ethnicpop) : drop if _n < _N 

Note again that this will not be robust to spelling errors. 

Nick 
[email protected] 

Michael Horowitz
 
> One more question.  Suppose you have a few duplicate 
> observations that are
> errors.  To take the example I used in my previous email, the 
> data is set
> up as such:
> 
> > > > Country Number  year    ethnicity       ethnicpop
> > > > 10              1930    Caucasion       1,000,000
> > > > 10              1930    Hispanic        50,000
> > > > 10              1931    Caucasion       1,000,100
> > > > 10              1931    Hispanic        51,000
> > > > 11              1931    Asia            10,000
> 
> Now suppose there are multiple observations for Caucasians for a given
> year but with slightly different ethnic population totals.  I wish to
> systematically keep the one with the higher number and drop 
> the other one.
> However, since the observations are not technical 
> "duplicates" given that
> the ethnic population scores are different, I am having 
> trouble using the
> "duplicate" command.  Does anyone have any ideas?
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index