Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos

Döllinger M, Schraut T, Henrich LA, Chhetri D, Echternach M, Johnson AM, Kunduk M, Maryn Y, Patel RR, Samlan R, Semmler M, Schützenberger A (2022)

Publication Type: Journal article

Publication year: 2022

Journal

Applied Sciences MDPI

Book Volume: 12

Article Number: 9791

Journal Issue: 19

DOI: 10.3390/app12199791

Abstract

Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting “concepts shifts” for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge.

Authors with CRIS profile

Michael Döllinger Professur für Computational Medicine Tobias Schraut Lehrstuhl für Hals-Nasen-Ohrenheilkunde Marion Semmler Anne Schützenberger Professur für Phoniatrie und Pädaudiologie

Related research project(s)

Objective analysis of functional based hoarseness by clinical high-speed endoscopy (DO 1247/8-1/2) Sept. 28, 2016 - Feb. 15, 2024

Involved external institutions

Universiteit Gent (UGent) / Ghent University

Belgium (BE) University of Arizona

United States (USA) (US) Klinikum der Universität München (LMU Klinikum)

Germany (DE) University of California Los Angeles (UCLA)

United States (USA) (US) New York University (NYU)

United States (USA) (US) Indiana University

United States (USA) (US) Louisiana State University

United States (USA) (US)

How to cite

APA:

Döllinger, M., Schraut, T., Henrich, L.A., Chhetri, D., Echternach, M., Johnson, A.M.,... Schützenberger, A. (2022). Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos. Applied Sciences, 12(19). https://doi.org/10.3390/app12199791

MLA:

Döllinger, Michael, et al. "Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos." Applied Sciences 12.19 (2022).

BibTeX: Download