How to Design and Employ Specialized Large Language Models for Accounting and Tax Research: The Example of TaxBERT

Hechtner F, Seebeck A, Schmidt L, Weiß M (2025)


Publication Language: English

Publication Type: Other publication type

Publication year: 2025

URI: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5146523

Abstract

Large Language Models (LLMs) are emerging as the new gold standard for accounting researchers utilizing Natural Language Processing (NLP) techniques to analyze the characteristics of financial and non-financial disclosures. This study provides an 'A-to-Z' description of how to design and employ specialized Bidirectional Encoder Representation of Transformers (BERT) models that are environmentally sustainable and practically feasible for accounting and tax researchers. We begin by highlighting some NLP-related key challenges, pitfalls, and shortcomings. Next, we provide a user's guide to LLMs and BERT models. Using the case-based approach of TaxBERT, a domain-specific LLM pretrained for analyzing qualitative corporate tax disclosures, we provide a detailed discussion on the efficient design, evaluation, and application of specialized BERT models tailored to specific domains. We show that TaxBERT leads to significant performance improvements over generic LLMs, FinBERT, and traditional bag of words approaches. By showcasing the development as well as the potential of domain-specific models, we aim to inspire researchers to develop and apply specialized LLMs in their research.

Authors with CRIS profile

How to cite

APA:

Hechtner, F., Seebeck, A., Schmidt, L., & Weiß, M. (2025). How to Design and Employ Specialized Large Language Models for Accounting and Tax Research: The Example of TaxBERT.

MLA:

Hechtner, Frank, et al. How to Design and Employ Specialized Large Language Models for Accounting and Tax Research: The Example of TaxBERT. 2025.

BibTeX: Download