Deep Learning for Orca Call Type Identification – A Fully Unsupervised Approach

Bergler C, Schmitt M, Cheng RX, Maier A, Barth V, Nöth E (2019)


Publication Language: English

Publication Type: Conference contribution, Original article

Publication year: 2019

Publisher: International Speech Communication Association

Pages Range: 3357-3361

Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Event location: Graz AT

DOI: 10.21437/Interspeech.2019-1857

Abstract

Call type classification is an important instrument in bioacoustic investigating group-specific vocal repertoire, behavioral patterns, and cultures of different animal groups. There is a need using robust machine-based techniques to replace classification due to its advantages in handling large delivering consistent results, removing perceptual-based classification, and minimizing human errors. The current work is the first adopting a two-stage fully unsupervised approach on previous machine-segmented orca data to identify orca sound types using deep learning together with one of the largest bioacoustic datasets – the Orchive. The proposed methods include: (1) unsupervised feature learning using an undercomplete ResNet18-autoencoder trained on machine-annotated data, and (2) spectral clustering utilizing compressed orca feature representations. An existing human-labeled orca dataset was clustered, including 514 signals distributed over 12 classes. This two-stage fully unsupervised approach is an initial study to (1) examine machine-generated clusters against human-identified orca call type classes, (2) compare supervised call type classification versus unsupervised call type clustering, and (3) verify the general feasibility of a completely unsupervised approach based on machine-labeled orca data resulting in a major progress within the research field of animal linguistics, by deriving a much deeper understanding and facilitating totally new insights and opportunities.

Authors with CRIS profile

Related research project(s)

Involved external institutions

How to cite

APA:

Bergler, C., Schmitt, M., Cheng, R.X., Maier, A., Barth, V., & Nöth, E. (2019). Deep Learning for Orca Call Type Identification – A Fully Unsupervised Approach. In Gernot Kubin, Thomas Hain, Bjorn Schuller, Dina El Zarka, Petra Hodl (Eds.), Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3357-3361). Graz, AT: International Speech Communication Association.

MLA:

Bergler, Christian, et al. "Deep Learning for Orca Call Type Identification – A Fully Unsupervised Approach." Proceedings of the 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, Graz Ed. Gernot Kubin, Thomas Hain, Bjorn Schuller, Dina El Zarka, Petra Hodl, International Speech Communication Association, 2019. 3357-3361.

BibTeX: Download