Improving next process activity prediction with scarce event log data using data augmentation with large language models

Käppel M, Weinzierl S, Ackermann L, Matzner M, Jablonski S (2026)

Publication Type: Journal article

Publication year: 2026

Journal

Information Systems Elsevier

Book Volume: 140

Article Number: 102717

DOI: 10.1016/j.is.2026.102717

Abstract

Predictive business process monitoring (PBPM) aims to predict targets like the next activity in running process instances to realize benefits, such as performance improvements for process participants. Recent PBPM techniques use machine learning algorithms to train accurate prediction models based on historical event log data. However, in practice, an adequate amount of data is often not available for these techniques, and the generation of additional data using existing random-based data augmentation techniques leads to data that deviates from the underlying business process. This work presents a study that investigates large language models for generating data representing novel but yet unseen process behavior – that is consistent with the underlying process – for the next activity prediction. As a foundation for the study, a large language model (LLM)-based data augmentation technique that relies on prompting is introduced. The experimental setup includes two synthetic and two real-life event logs, five LLMs, and 11 machine learning-based approaches. The results show that (i) LLM-based data augmentation can generate more meaningful traces than previous approaches and (ii) LLM-based data augmentation improves the predictive performance of machine learning models used for the next activity prediction. The findings of the study guide researchers and practitioners on how to configure, validate, and benchmark LLM-based data augmentation techniques for PBPM.

Authors with CRIS profile

Martin Käppel Lehrstuhl für Digital Industrial Service Systems Sven Weinzierl Lehrstuhl für Digital Industrial Service Systems Martin Matzner Lehrstuhl für Digital Industrial Service Systems

Involved external institutions

Hochschule für Angewandte Wissenschaften Hof

Germany (DE) Universität Bayreuth

Germany (DE)

How to cite

APA:

Käppel, M., Weinzierl, S., Ackermann, L., Matzner, M., & Jablonski, S. (2026). Improving next process activity prediction with scarce event log data using data augmentation with large language models. Information Systems, 140. https://doi.org/10.1016/j.is.2026.102717

MLA:

Käppel, Martin, et al. "Improving next process activity prediction with scarce event log data using data augmentation with large language models." Information Systems 140 (2026).

BibTeX: Download