A Two-Step, Multidimensional Account of Deception in Language Models

Dung L (2025)

Publication Type: Journal article

Publication year: 2025

Journal

Erkenntnis Springer Science+Business Media B.V

Abstract

Which AI systems are capable of deception, and how does deception differ between systems? In this paper, I develop a two-step, multi-dimensional account of LLM deception. On this account, having the capacity for deception minimally requires being able to produce false beliefs in others to achieve one’s own goals. In all systems which satisfy this minimal condition, a system’s deception profile can be characterized as a point in a multidimensional space. The five dimensions of this space are skillfulness, learning, deceptive inclination, explicitness, and situational awareness. I argue for this account in virtue of its fit with current language usage and, primarily, through its descriptive and explanatory usefulness. Specifically, the account captures the key dimensions of variation for LLM deception. The account is informative in that it allows fine-grained comparative characterizations of deception. Moreover, its dimensions are all accessible to empirical study, provide important information for assessments of the risks of LLM deception, and shed light on the cognitive processes involved in LLM deception. Finally, this account paves the way for a future extension which delivers a unified account of deception in biological and non-biological systems. Thus, the multidimensional account promises to significantly advance both the scientific study as well as the ethical assessment of LLM deception, and deception generally.

Involved external institutions

Ruhr-Universität Bochum (RUB)

Germany (DE)

How to cite

APA:

Dung, L. (2025). A Two-Step, Multidimensional Account of Deception in Language Models. Erkenntnis. https://doi.org/10.1007/s10670-025-01017-4

MLA:

Dung, Leonard. "A Two-Step, Multidimensional Account of Deception in Language Models." Erkenntnis (2025).

BibTeX: Download