CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.
Conformal Policy Control
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
An agent must try new behaviors to explore and improve. In high-stakes environments, an agent that violates safety constraints may cause harm and must be taken offline, curtailing any future interaction. Imitating old behavior is safe, but excessive conservatism discourages exploration. How much behavior change is too much? We show how to use any safe reference policy as a probabilistic regulator for any optimized but untested policy. Conformal calibration on data from the safe policy determines how aggressively the new policy can act, while provably enforcing the user's declared risk tolerance. Unlike conservative optimization methods, we do not assume the user has identified the correct model class nor tuned any hyperparameters. Unlike previous conformal methods, our theory provides finite-sample guarantees even for non-monotonic bounded loss functions. Our experiments on applications ranging from natural language question answering to biomolecular engineering show that safe exploration is not only possible from the first moment of deployment, but can also improve performance.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Value-filtered decoding steers LLM outputs for safety at decoding time using a value criterion with an explicit bound on false interventions controlled by one threshold hyperparameter.
citing papers explorer
-
Certified Speculative Execution for Untrusted AI Agents
CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.
-
Selective Safety Steering via Value-Filtered Decoding
Value-filtered decoding steers LLM outputs for safety at decoding time using a value criterion with an explicit bound on false interventions controlled by one threshold hyperparameter.