pith. machine review for the scientific record. sign in

arxiv: 2605.09638 · v1 · submitted 2026-05-10 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:03 UTC · model grok-4.3

classification 💻 cs.LG
keywords backdoor defensereinforcement learningMonte Carlo Tree Searchtest-time defenseadversarial robustnesspolicy securitytrigger detectionRL backdoors
0
0 comments X

The pith

Monte Carlo planning at test time detects and neutralizes backdoor triggers in reinforcement learning policies with only black-box access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Plan2Cleanse as a way to defend deployed RL models against backdoors by treating the search for activating trigger sequences as a planning task solved through Monte Carlo Tree Search. This recasts detection as systematic exploration of possible action sequences that could flip the policy to malicious behavior, followed by replanning to prevent those sequences from completing. A sympathetic reader cares because third-party trained RL agents in real systems can carry hidden triggers that stay dormant until activated, and many existing defenses demand retraining data or internal model details. The method keeps access limited to querying the policy for actions and uses the detection results to steer the agent away from danger at runtime. Tests in MuJoCo environments, O-RAN networks, and Atari games show the planning approach raises detection rates substantially and lifts competitive performance.

Core claim

Plan2Cleanse adapts Monte Carlo Tree Search to identify temporally extended trigger sequences that activate backdoors in RL policies and uses the detection outcomes for tree-search preventive replanning, all while requiring only black-box access to the target policy.

What carries the argument

Monte Carlo Tree Search recast as a planning algorithm that explores possible trigger sequences in the RL policy's state-action space and enables mitigation by replanning around discovered triggers.

If this is right

  • Trigger detection success rates increase by more than 61.4 percentage points in stealthy O-RAN scenarios.
  • Win rates rise from 35% to 53% in competitive Humanoid environments by neutralizing triggers through replanning.
  • Backdoor mitigation occurs at test time without requiring model retraining or white-box access to the policy.
  • The same planning framework applies across MuJoCo locomotion tasks, simulated wireless networks, and Atari games.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If backdoors rely on sequence-based triggers, comparable search methods might defend other black-box sequential decision systems such as planning agents in robotics.
  • Runtime monitors built on tree search could serve as a general layer for securing deployed RL policies against unknown attacks.
  • The approach might be extended by combining it with lightweight online adaptation to handle triggers that evolve over time.

Load-bearing premise

Backdoor triggers appear as temporally extended sequences of actions or states that can be efficiently discovered and neutralized through Monte Carlo planning while maintaining only black-box access to the target policy.

What would settle it

An experiment where no finite sequence activates the backdoor or where the search consistently fails to locate the true trigger despite its presence would show the planning method does not work.

Figures

Figures reproduced from arXiv: 2605.09638 by Chi-Yu Li, Kui-Yuan Chen, Ping-Chun Hsieh, Sze-Ann Chen, Zhi-Yi Chin.

Figure 1
Figure 1. Figure 1: An overview of our Plan2Cleanse framework, which addresses backdoor vulnerabilities in RL [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of Voronoi-based sampling in continuous action spaces. At a node corresponding [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: TDSR over training iterations in mobile-env. Solid lines show the median across seeds and shaded areas show the interquartile range. Plan2Cleanse sustains high detection performance even for minimally responsive Trojans, whereas PolicyCleanse drops notably as responsiveness decreases. Backdoor Detection in O-RAN Simulator. We evaluate Plan2Cleanse in the mobile-env across three levels of Trojan responsiven… view at source ↗
Figure 4
Figure 4. Figure 4: Average data rate (GB/s) under adversarial triggers in mobile-env. Backdoor Mitigation in O-RAN Simulator. We evalu￾ate Plan2Cleanse’s mitigation performance in the mobile-env, where benign UEs interact with a Trojan-infected base station controller [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: TDSR results under varying UE distances from the base station. Each plot reports the median [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: TDSR over training iterations in competitive RL environments. The solid lines show the median [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Win rate under adversar￾ial triggers in Ant and Humanoid. Plan2Cleanse surpasses the benign pol￾icy in Humanoid. Backdoor Mitigation in Competitive RL Environments. We compare four agents under adversarial triggers: benign, Trojan without mitigation, Trojan with PolicyCleanse, and Trojan with Plan2Cleanse. As shown in [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of planning horizon H and rollout threshold hrollout on win rate gain (%) in the Humanoid environment. fixed budgets. On the other hand, the rollout threshold hrollout controls stochastic exploration during search: small values restrict exploration and risk imitating the Trojan policy, while large values inject excessive noise and lead to unstable value estimates; moderate settings (i.e., around 3 t… view at source ↗
Figure 9
Figure 9. Figure 9: Win rates under different pertur￾bation strategies in the Ant and Humanoid environments. Each bar shows the average performance across three Trojan models. More Details on Backdoor Detection in Atari. In addition to the criteria specified in Section 5.1, we partition each Atari frame into 12 × 12 grids, yielding 49 (7 × 7) candi￾date patches. A trigger is accepted if inpainting a unique patch consistently … view at source ↗
Figure 10
Figure 10. Figure 10: Different trigger patterns: (a) Square, (b) Equal, (c) Cross, and (d) Checkerboard. [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: t-SNE visualization of trigger action sequences across five settings: Ant, Humanoid, and three [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Effect of detection depth T on final TDSR in mobile-env under High, Partial, and Minimal Trojan responsiveness. 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Final TDSR (a) Ant 10 20 30 40 50 60 (b) Humanoid Detection depth T [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Effect of detection depth T on final TDSR for Ant and Humanoid. 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0 Final TDSR (a) Ant 0.1 0.2 0.3 0.4 0.5 (b) Humanoid Exploration Probability [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Effect of exploration probability ω on final TDSR for Ant and Humanoid. robust to the exploration–exploitation trade-off. The shaded regions indicate the standard deviation across Trojan models, and each point corresponds to the mean performance aggregated over different Trojan models. F.3 Mitigation Hyperparameter Sensitivity To complement the Ant results presented in Section 5.6, we further provide the … view at source ↗
Figure 15
Figure 15. Figure 15: Effect of planning horizon H and threshold hrollout on win rate gain (%) in the Ant environment. 1 3 5 7 10 10.0 10.5 11.0 11.5 12.0 Data Rate (GB/s) (a) High 1 3 5 7 10 (b) Partial 1 3 5 7 10 (c) Minimal Planning Horizon H [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Effect of joint planning horizon H = hrollout on data rate (GB/s) in the mobile-env. 10 15 20 25 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Scores (a) Pong 10 15 20 25 10 11 12 13 14 15 16 17 18 19 20 21 (b) Breakout Planning Horizon H [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Effect of planning horizon H on mitigation performance in Atari: (a) Pong and (b) Breakout. G Ablation Study This section provides additional analyses under noisy-reward and sparse-reward settings in MuJoCo to further evaluate the robustness of Plan2Cleanse. G.1 Noisy-Reward Setting To assess robustness against noisy reward, we inject zero-mean Gaussian noise N (0, σ2 ) into the reward at each time step. … view at source ↗
Figure 18
Figure 18. Figure 18: Effect of noise standard deviation σ on final TDSR for Ant and Humanoid. G.2 Sparse-Reward Setting We further evaluate Plan2Cleanse under sparse rewards. Specifically, the environment reward is set to zero at every time step except at episode termination: the agent receives +1 upon winning and −1 upon losing. This modification introduces no additional hyperparameters. All other configurations follow those… view at source ↗
Figure 19
Figure 19. Figure 19: TDSR over training iterations in Ant and Humanoid. Solid lines show the median across seeds, [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
read the original abstract

Ensuring the security of reinforcement learning (RL) models is critical, particularly when they are trained by third parties and deployed in real-world systems. Attackers can implant backdoors into these models, causing them to behave normally under typical conditions, but execute malicious behaviors when specific triggers are activated. In this work, we propose Plan2Cleanse, a test-time detection and mitigation framework that adapts Monte Carlo Tree Search to efficiently identify and neutralize RL backdoor attacks without requiring model retraining. Our approach recasts backdoor detection as a planning problem, enabling systematic exploration of temporally extended trigger sequences while maintaining black-box access to the target policy. By leveraging the detection results, Plan2Cleanse can further achieve efficient mitigation through tree-search preventive replanning. We evaluated our method in competitive MuJoCo environments, simulated O-RAN wireless networks, and Atari games. Plan2Cleanse achieves substantial improvements, increasing trigger detection success rates by more than 61.4 percentage points in stealthy O-RAN scenarios and improving win rates from 35\% to 53\% in competitive Humanoid environments. These results demonstrate the effectiveness of our test-time defense approach and highlight the importance of proactive defenses against backdoor threats in RL deployments. Our implementation is publicly available at https://github.com/rl-bandits-lab/RL-Backdoor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Plan2Cleanse, a test-time backdoor defense for RL policies that recasts trigger detection as a Monte Carlo Tree Search planning problem over temporally extended sequences. It maintains only black-box access to the target policy, uses the planning output for mitigation via preventive replanning, and reports empirical results across MuJoCo competitive environments, simulated O-RAN networks, and Atari games, including detection-rate gains exceeding 61.4 percentage points in stealthy O-RAN cases and win-rate improvement from 35% to 53% in Humanoid.

Significance. If the claims are substantiated, the work contributes a practical test-time defense that avoids retraining and white-box assumptions, addressing a growing concern for third-party-trained RL models in deployed systems. The public code release supports reproducibility.

major comments (2)
  1. [§3] §3 (Method): The MCTS procedure requires a concrete evaluation function or scoring rule to identify backdoor-activating trajectories from black-box policy queries alone. No explicit definition, pseudocode, or implementation detail is supplied for this rule (e.g., reward deviation, state anomaly detector, or domain-specific heuristic), yet the reported 61.4 pp detection lift and 35 % → 53 % win-rate gain both presuppose that the rule reliably guides search to the correct trigger rather than high-variance or low-reward paths.
  2. [§4] §4 (Experiments): The O-RAN and Humanoid results claim large absolute gains without reporting the number of independent trials, statistical significance tests, error bars, or the precise baselines and trigger-construction protocols used. This absence prevents verification that the improvements are robust rather than artifacts of particular environment configurations or evaluation choices.
minor comments (2)
  1. The abstract and introduction use the term 'stealthy' triggers without a precise definition or reference to how stealth is quantified (e.g., trigger length, activation probability, or detectability by standard anomaly detectors).
  2. [§3] Notation for the planning components (e.g., the tree policy, rollout policy, and backdoor-specific value function) should be introduced with a single consistent table or diagram early in §3 to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each of the major comments in turn below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Method): The MCTS procedure requires a concrete evaluation function or scoring rule to identify backdoor-activating trajectories from black-box policy queries alone. No explicit definition, pseudocode, or implementation detail is supplied for this rule (e.g., reward deviation, state anomaly detector, or domain-specific heuristic), yet the reported 61.4 pp detection lift and 35 % → 53 % win-rate gain both presuppose that the rule reliably guides search to the correct trigger rather than high-variance or low-reward paths.

    Authors: We appreciate the referee pointing out the need for greater clarity on the MCTS evaluation function. Our approach employs a scoring rule that estimates the probability of backdoor activation by measuring deviations in observed rewards and action distributions from the policy's nominal behavior, using only black-box queries. We have now included an explicit mathematical definition of this scoring rule, along with pseudocode for the complete MCTS procedure, in the revised version of §3. This addition ensures that readers can understand how the search is guided toward trigger-activating trajectories. revision: yes

  2. Referee: [§4] §4 (Experiments): The O-RAN and Humanoid results claim large absolute gains without reporting the number of independent trials, statistical significance tests, error bars, or the precise baselines and trigger-construction protocols used. This absence prevents verification that the improvements are robust rather than artifacts of particular environment configurations or evaluation choices.

    Authors: We agree that the experimental section would benefit from additional details to support the robustness of the results. In the revised manuscript, we have added information on the number of independent trials conducted (specifically, 10 runs for each reported result), the statistical significance tests performed (including p-values from t-tests), error bars in the figures, and precise descriptions of the baseline algorithms and trigger construction methods for the O-RAN and Humanoid environments. These updates appear in §4 and the associated supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity; method is an independent planning procedure

full rationale

The paper introduces Plan2Cleanse as a novel test-time framework that recasts backdoor detection in RL as a Monte Carlo planning problem over trigger sequences, using only black-box policy queries. No equations, fitted parameters, or self-citations are presented that reduce the central construction to its own inputs by definition. The reported performance gains (e.g., 61.4 pp detection lift) are framed as empirical outcomes of applying the planning procedure, not as quantities forced by construction from prior fits or renamed known results. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that backdoor triggers can be modeled as temporally extended action sequences amenable to tree search; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Backdoor triggers manifest as temporally extended sequences that can be systematically explored via planning under black-box policy access
    The method recasts detection as a planning problem, which presupposes this structure of triggers.

pith-pipeline@v0.9.0 · 5557 in / 1217 out tokens · 47848 ms · 2026-05-12T02:03:46.269036+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    Universal Trojan signatures in reinforcement learning

    Manoj Acharya, Weichao Zhou, Anirban Roy, Xiao Lin, Wenchao Li, and Susmit Jha. Universal Trojan signatures in reinforcement learning. InNeurIPS 2023 Workshop on Backdoors in Deep Learning-The Good, the Bad, and the Ugly,

  2. [2]

    A method for evaluating hyperparameter sensitivity in reinforcementlearning.Advances in Neural Information Processing Systems (NeurIPS),37:124820–124842,

    16 Published in Transactions on Machine Learning Research (05/2026) Jacob Adkins, Michael Bowling, and Adam White. A method for evaluating hyperparameter sensitivity in reinforcementlearning.Advances in Neural Information Processing Systems (NeurIPS),37:124820–124842,

  3. [3]

    Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164,

  4. [4]

    AEVA: Black-box backdoor detection using adversarial extreme value analysis

    17 Published in Transactions on Machine Learning Research (05/2026) Junfeng Guo, Ang Li, and Cong Liu. AEVA: Black-box backdoor detection using adversarial extreme value analysis. InInternational Conference on Learning Representations (ICLR),

  5. [5]

    Sionna: An Open-Source Library for Next-Generation Physical Layer Research,

    Jakob Hoydis, Sebastian Cammerer, Fayçal Ait Aoudia, Avinash Vem, Nikolaus Binder, Guillermo Marcus, and Alexander Keller. Sionna: An open-source library for next-generation physical layer research.arXiv preprint arXiv:2203.11854,

  6. [6]

    Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. InMachine Learning Proceedings 1994, pp. 157–163,

  7. [7]

    Fine-pruning: Defending against backdooring attacks on deep neural networks

    Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. InInternational Symposium on Research in Attacks, Intrusions, and Defenses (RAID), Heraklion, Crete, Greece, 2018a. Shijie Liu, Andrew C Cullen, Paul Montague, Sarah Erfani, and Benjamin IP Rubinstein. Fox in the hen- house: Sup...

  8. [8]

    Trojaning attack on neural networks

    Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. InAnnual Network And Distributed System Security Symposium (NDSS), 2018b. 18 Published in Transactions on Machine Learning Research (05/2026) Madhusanka Liyanage, An Braeken, Shahriar Shahabuddin, and Pasika Ranaweera. Open...

  9. [9]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Munoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, et al. Isaac Lab: A GPU-accelerated simulation framework for multi-modal robot learning.arXiv preprint arXiv:2511.04831,

  10. [10]

    Playing Atari with Deep Reinforcement Learning

    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602,

  11. [11]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimiza- tion algorithms.arXiv preprint arXiv:1707.06347,

  12. [12]

    Mitigating deep reinforcement learning backdoors in the neural activation space

    19 Published in Transactions on Machine Learning Research (05/2026) Sanyam Vyas, Chris Hicks, and Vasilios Mavroudis. Mitigating deep reinforcement learning backdoors in the neural activation space. InIEEE Security and Privacy Workshops (SPW), pp. 76–86,

  13. [13]

    Beyond training-time poisoning: Component-level and post-training backdoors in deep reinforcement learning.arXiv preprint arXiv:2507.04883,

    Sanyam Vyas, Alberto Caron, Chris Hicks, Pete Burnap, and Vasilios Mavroudis. Beyond training-time poisoning: Component-level and post-training backdoors in deep reinforcement learning.arXiv preprint arXiv:2507.04883,

  14. [14]

    Advsim: Generating safety-critical scenarios for self-driving vehicles

    Jingkang Wang, Ava Pun, James Tu, Sivabalan Manivasagam, Abbas Sadat, Sergio Casas, Mengye Ren, and Raquel Urtasun. Advsim: Generating safety-critical scenarios for self-driving vehicles. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9909–9918, 2021a. Kuang-Da Wang, Teng-Ruei Chen, Yu Heng Hung, Guo-Xun Ko...

  15. [15]

    20 Published in Transactions on Machine Learning Research (05/2026) A Threat-Model Instantiations in Realistic RL Deployments To further substantiate the threat model in Section 3.3, we describe how it is realized in the three represen- tative domains considered in our experiments. •O-RAN Wireless Networks.In this setting, the target policy operates as a ...

  16. [16]

    These Trojan models form the evaluation testbed for both detection and mitigation

    to implant patch-style triggers and retrain Trojan agents accordingly. These Trojan models form the evaluation testbed for both detection and mitigation. Trigger Criteria.To evaluate whether a discovered sequence constitutes a successful trigger, we define environment-specific acceptance criteria: •Ant: A trigger is accepted if it causes a statistically s...

  17. [17]

    Letrsum denote the negated cumulative reward of the replayed candidate sequence, and letrref be a reference distribution obtained from 500 random action sequences

    and apply an anomaly detection procedure based on the Median Absolute Deviation (MAD). Letrsum denote the negated cumulative reward of the replayed candidate sequence, and letrref be a reference distribution obtained from 500 random action sequences. We compute the anomaly index as: Anomaly Index(rsum) := rsum−Median(rref) C·Median(|rref−Median(rref)|), w...

  18. [18]

    Table 4: Environment-specific hyperparameters for Plan2Cleanse detection and mitigation. Parameter Ant Humanoid Mobile-env Pong Breakout Detection DepthT60 10 10 1 1 Mitigatoin BudgetN500 500 10 30 50 Rollout Thresholdhrollout 3 3 5 1 1 Planning HorizonH5 5 5 20 20 Baseline Reproduction.For baseline reproduction, we matched the environment step magnitudes...

  19. [19]

    Results are mean±std

    C Various Attack Scenarios in Atari Games Table 5: Performance comparison under poisoned and clean environments for4×4patterns. Results are mean±std. Environment Method Square Equal Cross Checkerboard Poisoned Trojan0.033±0.145−0.127±0.064−0.147±0.170 0.053±0.189 Plan2Cleanse (Ours)0.950±0.014 0.973±0.012 0.787±0.151 0.880±0.060 Clean Trojan1.000±0.000 1....

  20. [20]

    experience a marked decline in prediction accuracy under adversarial manipulation, leading to measurable deterioration in system throughput and capacity. The modular and decentralized nature of O-RAN (Farooq etal.,2019)furtheramplifiestherisk, ascompromisedagentsmaytamperwithsharedobservationsordisrupt the behavior of co-located services. Addressing such ...

  21. [21]

    Moreover, a complete version of the replanning procedure for backdoor mitigation and the procedure for generating Trojan rollouts are provided in Algorithm 5 and Algorithm 6, respectively. Algorithm 4Danger State Marking in Detection TreeD 1:Input:Detection TreeD, Leaf nodes L, Backtrack depthK 2:Output:Updated detection treeT det with danger states marke...