C-MCTS: Safe Planning with Monte Carlo Tree Search

Parthasarathy D, Kontes G, Plinge A, Mutschler C (2026)

Publication Status: Submitted

Publication Type: Unpublished / Preprint

Future Publication Type: Journal article

Publication year: 2026

DOI: 10.48550/arXiv.2305.16209

The Constrained Markov Decision Process (CMDP) formulation allows to solve

safety-critical decision making tasks that are subject to constraints. While CMDPs

have been extensively studied in the Reinforcement Learning literature, little

attention has been given to sampling-based planning algorithms such as Monte

Carlo Tree Search (MCTS) for solving them. Previous approaches are conservative

with respect to costs as they avoid constraint violations by using Monte Carlo

cost estimates that suffer from high variance. We propose Constrained MCTS

(C-MCTS), which estimates cost using a safety critic that is trained with Temporal

Difference learning in an offline phase prior to agent deployment. The critic limits

exploration to unsafe regions during deployment by pruning unsafe trajectories

within MCTS. This makes C-MCTS more efficient w.r.t. planning steps. Compared

to previous work, it achieves higher rewards by operating closer to the constraint

boundary (while satisfying cost constraints) and is less susceptible to cost violations

under model mismatch between the planner and the deployment environment.

Dinesh Parthasarathy Lehrstuhl für Informatik 10 (Systemsimulation) (LSS) Christopher Mutschler Department Artificial Intelligence in Biomedical Engineering (AIBE)

APA:

Parthasarathy, D., Kontes, G., Plinge, A., & Mutschler, C. (2026). C-MCTS: Safe Planning with Monte Carlo Tree Search. (Unpublished, Submitted).

MLA:

Parthasarathy, Dinesh, et al. C-MCTS: Safe Planning with Monte Carlo Tree Search. Unpublished, Submitted. 2026.

BibTeX: Download