Schwär S, Müller M (2023)
Publication Type: Journal article
Publication year: 2023
Book Volume: 30
Pages Range: 1712-1716
Article Number: 3333205
The Multi-Scale Spectral (MSS) loss is commonly used for comparing audio signals, as it provides a good trade-off between temporal and spectral resolution. However, some configuration choices, including window type and size, magnitude compression, as well as the distance between spectrograms, are often set implicitly, even though they can significantly impact the loss properties and the convergence of trained models. Particularly in the context of differentiable digital signal processing (DDSP), where learned parameters may explicitly control the frequency of synthesis components, the MSS loss often fails to provide informative gradients. The main goal of this letter is to gain a better understanding of how different configurations of the MSS loss affect this problem. As an illustrative example, we analyze the task of sinusoid frequency estimation via gradient descent to compare different configurations and their effect on the loss properties. Furthermore, we show that favorable configurations can also facilitate unsupervised training of a more complex DDSP additive synthesis autoencoder. Our results indicate that a careful configuration may benefit many applications where the MSS loss is utilized.
APA:
Schwär, S., & Müller, M. (2023). Multi-Scale Spectral Loss Revisited. IEEE Signal Processing Letters, 30, 1712-1716. https://doi.org/10.1109/LSP.2023.3333205
MLA:
Schwär, Simon, and Meinard Müller. "Multi-Scale Spectral Loss Revisited." IEEE Signal Processing Letters 30 (2023): 1712-1716.
BibTeX: Download