Multi-Scale Spectral Loss Revisited

Schwär S, Müller M (2023)


Publication Type: Journal article

Publication year: 2023

Journal

Book Volume: 30

Pages Range: 1712-1716

Article Number: 3333205

DOI: 10.1109/LSP.2023.3333205

Abstract

The Multi-Scale Spectral (MSS) loss is commonly used for comparing audio signals, as it provides a good trade-off between temporal and spectral resolution. However, some configuration choices, including window type and size, magnitude compression, as well as the distance between spectrograms, are often set implicitly, even though they can significantly impact the loss properties and the convergence of trained models. Particularly in the context of differentiable digital signal processing (DDSP), where learned parameters may explicitly control the frequency of synthesis components, the MSS loss often fails to provide informative gradients. The main goal of this letter is to gain a better understanding of how different configurations of the MSS loss affect this problem. As an illustrative example, we analyze the task of sinusoid frequency estimation via gradient descent to compare different configurations and their effect on the loss properties. Furthermore, we show that favorable configurations can also facilitate unsupervised training of a more complex DDSP additive synthesis autoencoder. Our results indicate that a careful configuration may benefit many applications where the MSS loss is utilized.

Authors with CRIS profile

How to cite

APA:

Schwär, S., & Müller, M. (2023). Multi-Scale Spectral Loss Revisited. IEEE Signal Processing Letters, 30, 1712-1716. https://dx.doi.org/10.1109/LSP.2023.3333205

MLA:

Schwär, Simon, and Meinard Müller. "Multi-Scale Spectral Loss Revisited." IEEE Signal Processing Letters 30 (2023): 1712-1716.

BibTeX: Download