C-MCTS: Safe Planning with Monte Carlo Tree Search

Parthasarathy D, Kontes G, Plinge A, Mutschler C (2026)


Publication Status: Submitted

Publication Type: Unpublished / Preprint

Future Publication Type: Journal article

Publication year: 2026

DOI: 10.48550/arXiv.2305.16209

Abstract

The Constrained Markov Decision Process (CMDP) formulation allows to solve


safety-critical decision making tasks that are subject to constraints. While CMDPs


have been extensively studied in the Reinforcement Learning literature, little


attention has been given to sampling-based planning algorithms such as Monte


Carlo Tree Search (MCTS) for solving them. Previous approaches are conservative


with respect to costs as they avoid constraint violations by using Monte Carlo


cost estimates that suffer from high variance. We propose Constrained MCTS


(C-MCTS), which estimates cost using a safety critic that is trained with Temporal


Difference learning in an offline phase prior to agent deployment. The critic limits


exploration to unsafe regions during deployment by pruning unsafe trajectories


within MCTS. This makes C-MCTS more efficient w.r.t. planning steps. Compared


to previous work, it achieves higher rewards by operating closer to the constraint


boundary (while satisfying cost constraints) and is less susceptible to cost violations


under model mismatch between the planner and the deployment environment.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Parthasarathy, D., Kontes, G., Plinge, A., & Mutschler, C. (2026). C-MCTS: Safe Planning with Monte Carlo Tree Search. (Unpublished, Submitted).

MLA:

Parthasarathy, Dinesh, et al. C-MCTS: Safe Planning with Monte Carlo Tree Search. Unpublished, Submitted. 2026.

BibTeX: Download