The Impact of Random Models on Standardized Clustering Similarity

Klede K, Altstidl TR, Zanca D, Eskofier B (2024)


Publication Language: English

Publication Type: Journal article, Original article

Publication year: 2024

Journal

URI: https://ieeexplore.ieee.org/document/10769363

DOI: 10.1109/ACCESS.2024.3507133

Open Access Link: https://ieeexplore.ieee.org/document/10769363

Abstract

Clustering similarity measures are essential for evaluating clustering results and ensuring diversity in multiple clusterings of the same dataset. Common indices like the Mutual Information (MI) and Rand Index (RI) are biased towards smaller clusters and are often adjusted using a random permutation model. Recent advancements have standardized these measures to further correct biases, but the impact of different random models on these standardized measures has not yet been studied. In this work, we introduce equations for standardizing the MI/RI under non-permutation models, specifically focusing on a uniform model over all clusterings and a model that fixes the number of clusterings. Our results show that while standardization improves performance for the fixed number of clusters model, its benefits are limited in the more general uniform model. We validate our findings with gene expression data, highlighting the importance of choosing the right similarity metric for clustering comparison.

Authors with CRIS profile

How to cite

APA:

Klede, K., Altstidl, T.R., Zanca, D., & Eskofier, B. (2024). The Impact of Random Models on Standardized Clustering Similarity. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3507133

MLA:

Klede, Kai, et al. "The Impact of Random Models on Standardized Clustering Similarity." IEEE Access (2024).

BibTeX: Download