Conformal Policy Control

· 2026 · cs.AI · arXiv 2603.02196

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

An agent must try new behaviors to explore and improve. In high-stakes environments, an agent that violates safety constraints may cause harm and must be taken offline, curtailing any future interaction. Imitating old behavior is safe, but excessive conservatism discourages exploration. How much behavior change is too much? We show how to use any safe reference policy as a probabilistic regulator for any optimized but untested policy. Conformal calibration on data from the safe policy determines how aggressively the new policy can act, while provably enforcing the user's declared risk tolerance. Unlike conservative optimization methods, we do not assume the user has identified the correct model class nor tuned any hyperparameters. Unlike previous conformal methods, our theory provides finite-sample guarantees even for non-monotonic bounded loss functions. Our experiments on applications ranging from natural language question answering to biomolecular engineering show that safe exploration is not only possible from the first moment of deployment, but can also improve performance.

representative citing papers

Certified Speculative Execution for Untrusted AI Agents

cs.CR · 2026-06-30 · unverdicted · novelty 7.0

CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.

Selective Safety Steering via Value-Filtered Decoding

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Value-filtered decoding steers LLM outputs for safety at decoding time using a value criterion with an explicit bound on false interventions controlled by one threshold hyperparameter.

citing papers explorer

Showing 2 of 2 citing papers.

Certified Speculative Execution for Untrusted AI Agents cs.CR · 2026-06-30 · unverdicted · none · ref 40 · internal anchor
CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.
Selective Safety Steering via Value-Filtered Decoding cs.LG · 2026-05-14 · unverdicted · none · ref 27 · internal anchor
Value-filtered decoding steers LLM outputs for safety at decoding time using a value criterion with an explicit bound on false interventions controlled by one threshold hyperparameter.

Conformal Policy Control

fields

years

verdicts

representative citing papers

citing papers explorer