Learning low-dimensional embeddings of audio shingles for cross-version retrieval of classical music

Zalkow F, Müller M (2020)


Publication Type: Journal article

Publication year: 2020

Journal

Book Volume: 10

Article Number: 19

Journal Issue: 1

DOI: 10.3390/app10010019

Abstract

Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.

Authors with CRIS profile

How to cite

APA:

Zalkow, F., & Müller, M. (2020). Learning low-dimensional embeddings of audio shingles for cross-version retrieval of classical music. Applied Sciences, 10(1). https://doi.org/10.3390/app10010019

MLA:

Zalkow, Frank, and Meinard Müller. "Learning low-dimensional embeddings of audio shingles for cross-version retrieval of classical music." Applied Sciences 10.1 (2020).

BibTeX: Download