Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: RE: RE: Question erase duplicates values


From   "Sergiy Radyakin" <[email protected]>
To   [email protected]
Subject   Re: st: RE: RE: RE: RE: Question erase duplicates values
Date   Tue, 12 Aug 2008 15:06:24 -0400

Daniel,

depending on what you are doing, you might want to simply enforce
uniquemaster or uniqueusing at an earlier stage, or change the order
of how you merge the datasets (e.g. not A+B+C, but A+C+B). Also check
if _merge is of any use.

Regards, Sergiy

On 8/12/08, Daniel Sepulveda-Adams <[email protected]> wrote:
> Nick
>
> Yes I'm sure that I want to use -merge-
> I putted together three of this data base (using append & merge in the case
> that was necessary) and that generated duplicates values too and I was not
> able to merge the last one, therefore I'm thinking if I do in this way I
> will be able to finish the merge with the four data set. But maybe I'm
> wrong.
>
> Daniel A. Sepulveda Adams
> Research Scientist - PRIME Institute
> College of Pharmacy - University of Minnesota
> 308 Harvard ST SE, Weaver Densford Hall, 7-159
> Minneapolis, MN, 55455, USA
> Phone: 612-624-8489
> Cell Phone: 651-295-7771
> Fax: 612-625-9931
> Email: [email protected]
>
> -----Original Message-----
> From: Nick Cox [mailto:[email protected]]
> Sent: Tuesday, August 12, 2008 1:16 PM
> To: Daniel Sepulveda-Adams
> Subject: RE: RE: RE: RE: Question erase duplicates values
>
> Sergiy's code, just given separately, will do what you ask for.
>
> That's not the difficulty at all.
>
> My point remains: Why do you expect a -merge- to work on the results?
>
> Consider just ID 1. Once you hide the fact that some of the observations
> were for ID 1, a -merge- won't be able to do magic and rediscover that
> fact.
>
> Are you sure that you don't want an -append-, not a -merge-?
>
> Nick
> [email protected]
>
> Daniel Sepulveda-Adams
>
> I'm doing that because that is the only way that I know that I can
> create an
> ID to mix with my others data set that they have the same ID.
>
> And related to the last paragraph yes you are correct, I used
>
> Duplicates drop ID, force
>
> The reason to all of this is because I have four data set that have the
> same
> ID but only one of them has duplicates values, therefore the only way
> that I
> know to merge them is create an ID without the values that are
> duplicates.
> Do you have any suggestion?
>
> Nick Cox
>
> Thanks for this, but I don't understand it at all.
>
> Why you want to throw away information about your ID? If you map second
> and higher occurrences of each ID to missing, you just create
> duplicates of missing, and it is difficult to see how a -merge- could
> then work properly.
>
> Ironically enough, the syntax
>
> duplicates drop ID
>
> that I alluded to is illegal. Perhaps what you tried was -duplicates
> drop ID, force- and that would have the effect you describe.
>
> Daniel Sepulveda-Adams
>
> Sorry that I was not very precise & I understand your explanation, let
> see
> if I can be more precise. EX:
>
> ID      ndc     units1  units2  units3
> ----------------------------------------
> 1       1       5       6       7
> 1       1       4       8       9
> 2       2       7       8       6
> 2       2       8       2       1
> 3       3       1       4       6
> 3       3       4       6       8
>
> What I need is
> ID      ndc     units1  units2  units3
> ----------------------------------------
> 1       1       5       6       7
> .       1       4       8       9
> 2       2       7       8       6
> .       2       8       2       1
> 3       3       1       4       6
> .       3       4       6       8
>
> The command that I used was
> Duplicates drop ID, but that drop all the observations that were
> duplicates
> not just the duplicates values in the variables ID
>
> Let me know if that helps to understand my problem.
>
> Nick Cox
>
> There is no code here and no example data to be clear on what you tried.
>
> So, how can anyone answer this except by guessing?
>
> The fact that values of an identifier are repeated does not mean that
> the dataset should be cleaned up by removing duplicates of the
> identifier. That principle would wreak havoc on panel data. Cloning the
> identifier makes no difference to that principle. What is true of the
> original is true of the clone, necessarily.
>
> Perhaps you did something like
>
> . duplicates drop clonedid
>
> And -duplicates- refused. I am very pleased to hear that. I designed
> that behaviour into -duplicates- to protect people from losing
> information.
>
> Perhaps you did something else altogether, in which case please say
> precisely what.
>
> Daniel Sepulveda-Adams
>
> I'm trying to created a unique ID to make a merge between two date set
> But the Unique ID is a variable that have many duplicates values,
> therefore
> what I did was clone the variables and try to erase the duplicates
> values
> but just in the NEW variable but I was not able to do that. Anyone has
> an
> idea how to do that? Thank you for your time.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index