Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Martin Weiss" <martin.weiss1@gmx.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: AW: RE: Replacing duplicate values |

Date |
Thu, 1 Apr 2010 17:20:56 +0200 |

<> That penultimate line is probably redundant / leads to complaints from -reshape-: ************* clear* input byte(id) str4(ipc_1 ipc_2 ipc_3 ipc_4) 1 A44B G09F H04N 2 A47B G06F H05K E05D 3 A47B G06F 4 A47B H04N H05K 5 A47B 6 A47B F16M F16M H05K 7 A47B A47B F16M A47B end reshape long ipc_, i(id) bysort id ipc_: gen superfluousandredundant = _n > 1 drop if superfluousandredundant drop superfluousandredundant reshape wide ipc, i(id) j(_j) l, sepby(id) noo ************* HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox Gesendet: Donnerstag, 1. April 2010 17:00 An: statalist@hsphsun2.harvard.edu Betreff: st: RE: Replacing duplicate values It's a Stata two-step: reshape, drop duplicates, reshape back. Something like * warning: untested code reshape long ipc_, i(id) bysort id ipc_: gen superfluousandredundant = _n > 1 drop if superfluousandredundant bysort id (ipc) : gen j = _n reshape wide ipc, i(id) j(j) Actually, the last -reshape- might not be a good idea. The long structure might be more useful. Nick n.j.cox@durham.ac.uk Pavlos C. Symeou I have a dataset which concerns patents. Every patent is assigned a number of International Patent Classifications (IPCs). However, there are mistakes in the database and certain IPCs appear more than once for a single patent, which is meaningless. Examples are patents with id 6 and id 7 (ipc_1, ipc_2 etc list the number of IPCs a single patent is assigned). For the patent with id 6 we can see that ipc_2 and ipc_3 are the same. Id 7 illustrates a more general issue. Duplicate values may not appear sequentially and may appear more than twice. id ipc_1 ipc_2 ipc_3 ipc_4 1 A44B G09F H04N 2 A47B G06F H05K E05D 3 A47B G06F 4 A47B H04N H05K 5 A47B 6 A47B F16M F16M H05K 7 A47B A47B F16M A47B Can you suggest a way to delete the duplicate values, which can be more than two, and move the remaining to the left? For example patents with id 6 and id 7 would look like this: id ipc_1 ipc_2 ipc_3 ipc_4 6 A47B F16M H05K 7 A47B F16M * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Replacing duplicate values***From:*"Pavlos C. Symeou" <p.symeou@lmu.de>

**References**:**st: Replacing duplicate values***From:*"Pavlos C. Symeou" <p.symeou@lmu.de>

**st: RE: Replacing duplicate values***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: RE: Replacing duplicate values** - Next by Date:
**st: Internal model validation** - Previous by thread:
**Re: st: RE: Replacing duplicate values** - Next by thread:
**st: Replacing duplicate values** - Index(es):