High-Resolution Violin Transcription Using Weak Labels

Tamer NC, Özer Y, Müller M, Serra X (2023)


Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2023

Publisher: ISMIR

Pages Range: 223-230

Conference Proceedings Title: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)

Event location: Mailand IT

DOI: 10.5281/ZENODO.10265263

Abstract

A descriptive transcription of a violin performance requires detecting not only the notes but also the fine-grained pitch variations, such as vibrato. Most existing deep learning methods for music transcription do not capture these variations and often need frame-level annotations, which are scarce for the violin. In this paper, we propose a novel method for high-resolution violin transcription that can leverage piece-level weak labels for training. Our conformer-based model works on the raw audio waveform and transcribes violin notes and their corresponding pitch deviations with 5.8 ms frame resolution and 10-cent frequency resolution. We demonstrate that our method (1) outperforms generic systems in the proxy tasks of violin transcription and pitch estimation, and (2) can automatically generate new training labels by aligning its feature representations with unseen scores. We share our model along with 34 hours of score-aligned solo violin performance dataset, notably including the 24 Paganini Caprices.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Tamer, N.C., Özer, Y., Müller, M., & Serra, X. (2023). High-Resolution Violin Transcription Using Weak Labels. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (pp. 223-230). Mailand, IT: ISMIR.

MLA:

Tamer, Nazif Can, et al. "High-Resolution Violin Transcription Using Weak Labels." Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Mailand ISMIR, 2023. 223-230.

BibTeX: Download