Optimising speech recognition using LLMs: an application in the surgical domain

Matasyoh NM, Zeineldin R, Mathis-Ullrich F (2024)

Publication Type: Journal article

Publication year: 2024

Journal

Current Directions in Biomedical Engineering De Gruyter

Book Volume: 10

Pages Range: 45-48

Journal Issue: 1

DOI: 10.1515/cdbme-2024-0112

Abstract

Automatic speech recognition (ASR), powered by deep learning techniques, is crucial for enhancing humancomputer interaction. However, its full potential remains unrealized in diverse real-world environments, with challenges such as dialects, accents, and domain-specific jargon, particularly in fields like surgery, persisting. Here, we investigate the potential of large language models (LLMs) as error correction modules for ASR. We leverage Whisper-medium or ASRLibriSpeech for speech recognition, and GPT-3.5 or GPT-4 for error correction. We employ various prompting methods, from zero-shot to few-shot with leading questions and sample medical terms to correct wrong transcriptions. Results, measured by word error rate (WER), reveal Whisper's superior transcription accuracy over ASR-LibriSpeech, with a WER of 11.93% compared to 32.09%. GPT-3.5, with the few-shot with medical terms prompting method, further enhances performance, achieving a 64.29% and 37.83% WER-reduction for Whisper and ASR-LibriSpeech, respectively. Additionally, Whisper exhibits faster execution speed. Substituting GPT-3.5 with GPT- 4 further enhances transcription accuracy. Despite some few challenges, our approach demonstrates the potential of leveraging domain-specific knowledge through LLM prompting for accurate transcription, particularly in sophisticated domains like surgery.

Authors with CRIS profile

Nevin Musula Matasyoh Professur für Robotische Planung und Kognition in der Chirurgie Ramy Zeineldin Professur für Robotische Planung und Kognition in der Chirurgie Franziska Mathis-Ullrich Professur für Robotische Planung und Kognition in der Chirurgie

How to cite

APA:

Matasyoh, N.M., Zeineldin, R., & Mathis-Ullrich, F. (2024). Optimising speech recognition using LLMs: an application in the surgical domain. Current Directions in Biomedical Engineering, 10(1), 45-48. https://doi.org/10.1515/cdbme-2024-0112

MLA:

Matasyoh, Nevin Musula, Ramy Zeineldin, and Franziska Mathis-Ullrich. "Optimising speech recognition using LLMs: an application in the surgical domain." Current Directions in Biomedical Engineering 10.1 (2024): 45-48.

BibTeX: Download