Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Replacing duplicate values


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Replacing duplicate values
Date   Thu, 1 Apr 2010 17:59:12 +0100

Not really. You can manage fine without -duplicates- here, as my code sketch implied. 

Nick 
n.j.cox@durham.ac.uk 

Abdel Rahmen El Lahga

the question was not enough clear and I simply invited  Pavlos to try
duplicates which is certainly the unvoidable command to perform such
task

2010/4/1 Martin Weiss <martin.weiss1@gmx.de>:

> Abdel, how would you do that?
>
>
> *************
> clear*
>
> input byte(id) str4(ipc_1 ipc_2 ipc_3 ipc_4)
> 1     A44B    G09F    H04N
> 2     A47B    G06F    H05K    E05D
> 3     A47B    G06F
> 4     A47B    H04N    H05K
> 5     A47B
> 6     A47B    F16M    F16M    H05K
> 7     A47B    A47B F16M A47B
> end
>
> duplicates report ipc_?
> *************
>
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Abdel Rahmen El
> Lahga
>
> type  help duplicates drop under Stata and you will find what you are
> looking for

> 2010/4/1 Pavlos C. Symeou <p.symeou@lmu.de>:

>> I would like to ask for your assistance with the following:
>>
>> I have a dataset which concerns patents. Every patent is assigned a number
>> of International Patent Classifications (IPCs). However, there are
> mistakes
>> in the database and certain IPCs appear more than once for a single
> patent,
>> which is meaningless. Examples are patents with id 6 and id 7 (ipc_1,
> ipc_2
>> etc list the number of IPCs a single patent is assigned). For the patent
>> with id 6 we can see that ipc_2 and ipc_3 are the same.  Id 7 illustrates
> a
>> more general issue. Duplicate values may not appear sequentially and may
>> appear more than twice.
>>
>> id    ipc_1    ipc_2    ipc_3    ipc_4
>> 1     A44B    G09F    H04N
>> 2     A47B    G06F    H05K    E05D
>> 3     A47B    G06F
>> 4     A47B    H04N    H05K
>> 5     A47B
>> 6     A47B    F16M    F16M    H05K
>> 7     A47B    A47B F16M A47B
>>
>> Can you suggest a way to delete the duplicate values, which can be more
> than
>> two, and move the remaining to the left? For example patents with id 6 and
>> id 7 would look like this:
>>
>> id    ipc_1    ipc_2    ipc_3    ipc_4
>> 6     A47B    F16M    H05K
>> 7     A47B    F16M

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index