A Unified Perspective on CTC and Soft-DTW Using Differentiable DTW

Zeitler J, Müller M (2026)


Publication Type: Journal article

Publication year: 2026

DOI: 10.1109/TASLPRO.2026.3657213

Abstract

Training deep neural networks on unaligned sequence data is fundamental to tasks such as automatic speech recognition, lyrics alignment, and music transcription. Strongly aligned annotations, which provide frame-level correspondences between input and target sequences, are often costly, impractical, or unreliable. In contrast, weakly aligned annotations, which specify only segment-level alignment, are more scalable and easier to obtain, but present challenges for training and supervision. A widely used technique for handling weakly aligned data is Connectionist Temporal Classification (CTC). While CTC enables end-to-end training without explicit alignments, it is difficult to interpret, structurally rigid, and relies on a special blank symbol to handle label repetitions. The main contribution of this work is to explore the relationship between CTC and the less commonly used but conceptually simpler Soft Dynamic Time Warping (SDTW), which offers a more intuitive and flexible approach to weak alignment. We introduce a generalization of SDTW that incorporates cell-wise step weights, variable step sizes, and flexible boundary conditions. We refer to this extended framework as Differentiable Dynamic Time Warping (dDTW), which naturally subsumes CTC and SDTW as special cases and provides a unified perspective on these alignment-based losses. We systematically compare SDTW, CTC, and related variants in two controlled and illustrative tasks from music information retrieval, analyzing prediction accuracy, training stability, alignment behavior, and the implications of the blank symbol, in both single- and multi-label problems.

Authors with CRIS profile

How to cite

APA:

Zeitler, J., & Müller, M. (2026). A Unified Perspective on CTC and Soft-DTW Using Differentiable DTW. . https://doi.org/10.1109/TASLPRO.2026.3657213

MLA:

Zeitler, Johannes, and Meinard Müller. "A Unified Perspective on CTC and Soft-DTW Using Differentiable DTW." (2026).

BibTeX: Download