Sirocchi C, Urschler M, Pfeifer B (2025)
Publication Type: Journal article
Publication year: 2025
Book Volume: 18
Article Number: 15
Issue: 1
DOI: 10.1186/s13040-025-00430-3
Explainable and interpretable machine learning has emerged as essential in leveraging artificial intelligence within high-stakes domains such as healthcare to ensure transparency and trustworthiness. Feature importance analysis plays a crucial role in improving model interpretability by pinpointing the most relevant input features, particularly in disease subtyping applications, aimed at stratifying patients based on a small set of signature genes and biomarkers. While clustering methods, including unsupervised random forests, have demonstrated good performance, approaches for evaluating feature contributions in an unsupervised regime are notably scarce. To address this gap, we introduce a novel methodology to enhance the interpretability of unsupervised random forests by elucidating feature contributions through the construction of feature graphs, both over the entire dataset and individual clusters, that leverage parent-child node splits within the trees. Feature selection strategies to derive effective feature combinations from these graphs are presented and extensively evaluated on synthetic and benchmark datasets against state-of-the-art methods, standing out for performance, computational efficiency, reliability, versatility and ability to provide cluster-specific insights. In a disease subtyping application, clustering kidney cancer gene expression data over a feature subset selected with our approach reveals three patient groups with different survival outcomes. Cluster-specific analysis identifies distinctive feature contributions and interactions, essential for devising targeted interventions, conducting personalised risk assessments, and enhancing our understanding of the underlying molecular complexities.
APA:
Sirocchi, C., Urschler, M., & Pfeifer, B. (2025). Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping. BioData Mining, 18. https://doi.org/10.1186/s13040-025-00430-3
MLA:
Sirocchi, Christel, Martin Urschler, and Bastian Pfeifer. "Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping." BioData Mining 18 (2025).
BibTeX: Download