# Comparing partitions

@article{Hubert1985ComparingP, title={Comparing partitions}, author={Lawrence J. Hubert and Phipps Arabie}, journal={Journal of Classification}, year={1985}, volume={2}, pages={193-218} }

The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of… Expand

#### 2,069 Citations

Comparing clusterings---an information based distance

- Mathematics
- 2007

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of… Expand

Relational Generalizations of Cluster Validity Indices

- Mathematics, Computer Science
- IEEE Transactions on Fuzzy Systems
- 2010

This work generalizes three well-known validity indices: the modified Hubert's Gamma, Xie-Beni, and the generalized Dunn's indices, to relational data and develops a framework to convert many other validity indices to a relational form. Expand

Adjusting for Chance Clustering Comparison Measures

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2016

This paper solves the key technical challenge of analytically computing the expected value and variance of generalized IT measures and proposes guidelines for using ARI and AMI as external validation indices. Expand

Comparing Two Partitions of Non-Equal Sets of Units

- Mathematics
- 2018

Rand (1971) proposed what has since become a well-known index for comparing two partitions obtained on the same set of units. The index takes a value on the interval between 0 and 1, where a higher… Expand

Understanding partition comparison indices based on counting object pairs

- Mathematics, Computer Science
- ArXiv
- 2019

The overall indices based on the pair-counting approach are sensitive to cluster size imbalance and tend to reflect the degree of agreement on the large clusters and provide little to no information on smaller clusters. Expand

Comparing hard and overlapping clusterings

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2015

A corrected-for-chance measure (13AGRI) capable of comparing exclusive hard, fuzzy/probabilistic, non-exclusive hard, and possibilistic clusterings is developed and it is proved that 13AGRI and the adjusted Rand index (ARI) are equivalent in the exclusive hard domain. Expand

A modification of the k-means method for quasi-unsupervised learning

- Computer Science
- Knowl. Based Syst.
- 2013

This paper builds upon a modification of the celebrated k-means method resorting to a similar alternating optimization procedure, endowed with additive partition weights controlling the size of the partitions formed, adjusted by means of the Levenberg-Marquardt algorithm, and proposes several further variations on this modification. Expand

An Extension of the Infinite Relational Model Incorporating Interaction between Objects

- Computer Science
- PAKDD
- 2013

This paper proposes an extension of the IRM by introducing a subset mechanism that selects a part of the data according to the interaction among objects and presents posterior probabilities for running collapsed Gibbs sampling to learn the model from the given data. Expand

Extending the rand, adjusted rand and jaccard indices to fuzzy partitions

- Computer Science
- Journal of Intelligent Information Systems
- 2008

This paper looks at some commonly used clustering measures including the rand index (RI), adjusted RI (ARI) and the jaccuard index(JI) that are already defined for crisp clustering and extends them to fuzzy clustering Measures giving FRI,FARI and FJI. Expand

Unsupervised extra trees: a stochastic approach to compute similarities in heterogeneous data

- Computer Science
- International Journal of Data Science and Analytics
- 2020

The empirical study shows that the approach based on UET outperforms existing methods in some cases and reduces the amount of preprocessing typically needed when dealing with real-world datasets. Expand