Routing in Sparsely-gated Language Models responds to Context

Arnold S, Fietta M, Yesilbas D (2024)

Publication Language: English

Publication Type: Conference contribution

Publication year: 2024

Publisher: Association for Computational Linguistics (ACL)

Pages Range: 15-22

Conference Proceedings Title: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Event location: Miami, FL

ISBN: 9798891761704

DOI: 10.18653/v1/2024.blackboxnlp-1.2

Abstract

Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.

Authors with CRIS profile

Stefan Arnold Lehrstuhl für Wirtschaftsinformatik, insbesondere IT-Management Marian Fietta Lehrstuhl für Wirtschaftsinformatik, insbesondere im Dienstleistungsbereich Dilara Yesilbas Lehrstuhl für Wirtschaftsinformatik, insbesondere IT-Management

How to cite

APA:

Arnold, S., Fietta, M., & Yesilbas, D. (2024). Routing in Sparsely-gated Language Models responds to Context. In Yonatan Belinkov, Najoung Kim, Jaap Jumelet, Hosein Mohebbi, Aaron Mueller, Hanjie Chen (Eds.), Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP (pp. 15-22). Miami, FL, US: Association for Computational Linguistics (ACL).

MLA:

Arnold, Stefan, Marian Fietta, and Dilara Yesilbas. "Routing in Sparsely-gated Language Models responds to Context." Proceedings of the 7th BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2024, Miami, FL Ed. Yonatan Belinkov, Najoung Kim, Jaap Jumelet, Hosein Mohebbi, Aaron Mueller, Hanjie Chen, Association for Computational Linguistics (ACL), 2024. 15-22.

BibTeX: Download