Käppel M, Weinzierl S, Ackermann L, Matzner M, Jablonski S (2026)
Publication Type: Journal article
Publication year: 2026
Book Volume: 140
Article Number: 102717
Predictive business process monitoring (PBPM) aims to predict targets like the next activity in running process instances to realize benefits, such as performance improvements for process participants. Recent PBPM techniques use machine learning algorithms to train accurate prediction models based on historical event log data. However, in practice, an adequate amount of data is often not available for these techniques, and the generation of additional data using existing random-based data augmentation techniques leads to data that deviates from the underlying business process. This work presents a study that investigates large language models for generating data representing novel but yet unseen process behavior – that is consistent with the underlying process – for the next activity prediction. As a foundation for the study, a large language model (LLM)-based data augmentation technique that relies on prompting is introduced. The experimental setup includes two synthetic and two real-life event logs, five LLMs, and 11 machine learning-based approaches. The results show that (i) LLM-based data augmentation can generate more meaningful traces than previous approaches and (ii) LLM-based data augmentation improves the predictive performance of machine learning models used for the next activity prediction. The findings of the study guide researchers and practitioners on how to configure, validate, and benchmark LLM-based data augmentation techniques for PBPM.
APA:
Käppel, M., Weinzierl, S., Ackermann, L., Matzner, M., & Jablonski, S. (2026). Improving next process activity prediction with scarce event log data using data augmentation with large language models. Information Systems, 140. https://doi.org/10.1016/j.is.2026.102717
MLA:
Käppel, Martin, et al. "Improving next process activity prediction with scarce event log data using data augmentation with large language models." Information Systems 140 (2026).
BibTeX: Download