Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Duplicate observations


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Duplicate observations
Date   Mon, 10 Mar 2014 18:35:04 +0000

Duplicates are a red red herring herring here.

(http://en.wikipedia.org/wiki/Red_herring may help internationally.)

To keep the last only

bysort reporter partner (date) : keep if _n == _N




Nick
[email protected]


On 10 March 2014 18:30, emanuele mazzini <[email protected]> wrote:
> Hello to everybody,
>
> I have an issue about duplicate observations that I find puzzling to solve.
> I have data on country-pairs by year and I am interested in two
> specific variables, a date and, say a variable which I call x_1.
>
> Specifically, my data look like this :
>
> reporter  partner   year       date         x_1
>
> Albania  Austria   1980   6dec1980     n_1
> Albania  Austria   1980  15nov1980    n_1
> .         .        .
> .         .        .
> .         .        .
>
> As you may have noticed observations differ amongst them only by date
> and I need to drop them so as to keep the most recent one (hence, in
> this case the second one).
>
> I ran the following commands:
>
> duplicates tag reporter partner year, generate(dup)
>
> by reporter partner year (x_1 -date), sort: gen duplicates=_n
>
> so as to be able to identify duplicates and then - among those with
> dup >0 - drop those for which duplicates > 1.
> This method was suggested in this thread (I take this opportunity to
> thank again), but it seems not to work for some observations.
> Take, for instance the following example:
>
> reporter partner    year      date         x_1    dup     duplicates
> Albania Germany 1967 08apr1967    n_1      1           1
> Albania Germany 1967 17dec1967   n_1      1           2
>
> As you may notice, Stata identifies the observation occurred the
> 17dec1967 as those with duplicates > 1 (which will then be dropped),
> while I would have expected Stata to make the opposite.
>
> Can anyone explain me why and, possibly, tell me how to deal with such issue?
>
> Thank you very much in advance,
>
> Emanuele
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index