Hernandez A, Yang SH (2021)
Publication Language: English
Publication Type: Conference contribution, Original article
Publication year: 2021
Publisher: Springer
Series: Lecture Notes in Computer Science
Book Volume: 12997
Conference Proceedings Title: Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning
Event location: Online
ISBN: 978-3-030-87802-3
URI: https://link.springer.com/chapter/10.1007/978-3-030-87802-3_24
DOI: 10.1007/978-3-030-87802-3_24
This paper introduces a lecture video corpus, Autoblog 2020. With the increase of online learning in universities, there is a demand for a systematic toolchain development for lecture video processing. However, the existing lecture video corpus does not satisfy the requirement for such tasks, and lecture transcription and analyses are relatively unexplored areas in speech and natural language research. Autoblog 2020 Corpus is developed towards the end goal of free video-to-blog post conversion software that supports making video presentations more accessible. It will include automatic editing of disfluencies, automatic speech recognition (ASR), and spoken term extraction so that researchers can process and share their contents more efficiently. In this paper, we present a description of the corpus, linguistic analyses and preliminary experiment results regarding ASR, keyword extraction, and segmentation. The results will be used in future work to develop a video-to-blog post conversion.
APA:
Hernandez, A., & Yang, S.H. (2021). Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning. In Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning. Online: Springer.
MLA:
Hernandez, Abner, and Seung Hee Yang. "Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning." Proceedings of the International Conference on Speech and Computer, Online Springer, 2021.
BibTeX: Download