pith. sign in

arxiv: 2606.00414 · v1 · pith:ZYM5S3FAnew · submitted 2026-05-29 · 💻 cs.LG

Auditing Near-Optimal Policies Can Be Exponentially Hard: Conditional Query Lower Bounds via Occupancy Rashomon Capacity

Pith reviewed 2026-06-28 22:54 UTC · model grok-4.3

classification 💻 cs.LG
keywords reinforcement learningpolicy auditingRashomon capacityoccupancy measuresquery complexitynear-optimal policieslower bounds
0
0 comments X

The pith

When the audited class contains a high-capacity packing of near-optimal policies with sparse signatures, exact local-query auditing requires exponentially many queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an occupancy-measure analogue of Rashomon capacity as the metric entropy of the near-optimal occupancy region relative to an audited deployment class. It proves a conditional lower bound: if that class contains a 2/H-separated near-optimal packing with b-sparse local signatures that realizes deployment-class capacity, then exact local-query auditing needs Omega(M/b) queries. When b is constant the bound becomes exponential in the capacity. The authors attain the bound with a finite discounted hidden-branch MDP and extend the analysis to noisy sample oracles.

Core claim

If the audited deployment class contains a 2/H-separated near-optimal packing whose local signatures are b-sparse and that realizes the deployment-class capacity, then exact local-query auditing requires Omega(M/b) queries; when the packing realizes capacity and b is constant this becomes Omega(2 to the power of Hopt^F(eps)). A finite discounted hidden-branch MDP attains the bound, and a mixture lower bound of order M/beta holds for noisy hidden-trigger testing.

What carries the argument

Occupancy Rashomon capacity: the metric entropy of the near-optimal occupancy region computed relative to the audited deployment class, which counts the number of occupancy-distinct near-optimal policies.

If this is right

  • Exact local-query auditing requires Omega(M/b) queries under the stated packing condition.
  • Noisy hidden-trigger testing requires Omega(M/beta) samples where beta is the per-sample KL signal.
  • The audited capacity collapses under a canonical occupancy regularizer when a trusted reference occupancy is supplied.
  • A transcript-compatible oracle-cover verification upper bound exists alongside the lower bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Auditors may need to switch to noisy or reference-assisted methods in high-capacity environments rather than exact local queries.
  • Designing or selecting deployment classes with low occupancy Rashomon capacity could make auditing tractable in practice.
  • The hardness result may extend to continuous or visual control tasks where post-processed policies exhibit similar occupancy diversity.

Load-bearing premise

The audited deployment class contains a 2/H-separated near-optimal packing with b-sparse local signatures that realizes the deployment-class capacity.

What would settle it

An audited deployment class that contains many return-equivalent policies yet has low metric entropy in occupancy space, so that exact auditing succeeds with o(2 to the power of capacity) local queries.

Figures

Figures reproduced from arXiv: 2606.00414 by Anuj Sharma, Ibne Farabi Shihab, Sanjeda Akter.

Figure 1
Figure 1. Figure 1: Hidden-branch MDP (Theorem 8). All transitions are essentially deterministic except at s0, which routes uniformly to one of M branches. Rewards are identically one, so every policy in the deployment class FM = {π1, . . . , πM} is optimal. Policy πi plays the distinguished action only on branch i, producing 1-sparse exact local signatures: querying any branch state reveals which πi is deployed only when the… view at source ↗
read the original abstract

When many reinforcement-learning policies achieve near-optimal return, a post-hoc auditor may have to distinguish among many behaviorally distinct but return-equivalent policies. We formalize this phenomenon through an occupancy-measure analogue of Rashomon capacity: the metric entropy of the near-optimal occupancy region, computed relative to an audited deployment class. Because occupancy measures identify behavior only up to occupancy equivalence, we formulate auditing at the occupancy-class level and distinguish exact local-query oracles from noisy sample-query oracles. Our main exact-query result is conditional: if the audited class contains a $2/H$-separated near-optimal packing whose local signatures are $b$-sparse, then exact local-query auditing requires $\Omega(M/b)$ queries; when the packing realizes deployment-class capacity and $b=O(1)$, this becomes $\Omega(2^{\Hopt^\cF(\eps)})$. We give a finite discounted hidden-branch MDP attaining this bound and show the exact Bayes success law. For noisy hidden-trigger testing, we prove a mixture lower bound of order $M/\beta$, where $\beta$ is the per-sample KL signal, yielding $\Omega(2^{\Hopt^\cF(\eps)}/(\rho^2\Delta^2))$ for capacity-order packings with $\beta=O(\rho^2\Delta^2)$. We also provide a static target-recognition information lower bound, a transcript-compatible oracle-cover verification upper bound, and a canonical occupancy regularizer whose regularized audited capacity collapses when a trusted reference occupancy is available. Controlled benchmarks distinguish positive sparse-signature instances from high-capacity negative controls where exact auditing is easy, and map the noisy-trigger law to post-processed continuous-control and visual-RL auditing regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper defines occupancy Rashomon capacity as the metric entropy of the near-optimal occupancy region relative to an audited deployment class. It proves a conditional exact-query lower bound: if the audited class contains a 2/H-separated near-optimal packing with b-sparse local signatures realizing deployment-class capacity, then exact local-query auditing requires Ω(M/b) queries, which becomes Ω(2^{Hopt^F(ε)}) for b=O(1). The paper supplies a finite discounted hidden-branch MDP attaining the bound, an exact Bayes success law, a mixture lower bound Ω(M/β) for noisy hidden-trigger testing (yielding Ω(2^{Hopt^F(ε)}/(ρ²Δ²)) for capacity-order packings), a static target-recognition information lower bound, a transcript-compatible oracle-cover verification upper bound, a canonical occupancy regularizer, and controlled benchmarks separating sparse-signature positive instances from high-capacity negative controls.

Significance. If the conditional premise holds, the results establish that post-hoc auditing of near-optimal policies can be exponentially hard in the effective horizon or capacity when the deployment class admits a suitable packing. The explicit finite-MDP construction attaining the bound, the exact Bayes success law, the mixture lower bound for the noisy case, and the positive/negative benchmark controls are strengths that make the conditional claim falsifiable and technically grounded. The distinction between exact local-query and noisy sample-query oracles, together with the occupancy-class formulation, clarifies the setting in which the hardness applies.

minor comments (3)
  1. [§2] The notation Hopt^F(ε) and the precise definition of the deployment-class capacity (relative to the audited class) should be stated in a single displayed equation early in §2 or §3 so that the conditional premise can be checked without cross-referencing the abstract.
  2. [Benchmarks] In the benchmark section, the high-capacity negative-control instances should report the realized occupancy Rashomon capacity value alongside the query count to make the separation from the sparse-signature positive instances quantitative.
  3. [Upper bound] The transcript-compatible oracle-cover verification upper bound is stated without an explicit dependence on the packing parameters; adding a short remark on how the upper bound scales with M and b would clarify its relation to the Ω(M/b) lower bound.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and accurate summary of the manuscript, the positive assessment of its significance, and the recommendation for minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper states conditional lower bounds on exact local-query auditing that hold only if the audited deployment class contains a 2/H-separated near-optimal packing with b-sparse local signatures realizing deployment-class capacity. It supplies an explicit finite discounted hidden-branch MDP construction attaining the bound, derives an exact Bayes success law, a mixture lower bound for the noisy case, and provides controlled benchmarks separating positive and negative instances. No load-bearing step reduces by definition, by renaming a fitted parameter as a prediction, or via a self-citation chain; all central claims rest on independent information-theoretic and query-complexity arguments plus the supplied construction. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the newly introduced occupancy Rashomon capacity together with standard MDP and oracle definitions; the exponential bound is conditional on the existence of a specific packing.

axioms (2)
  • domain assumption Finite discounted hidden-branch MDP
    The example attaining the bound is defined on this class of MDPs.
  • standard math Occupancy measures identify behavior up to equivalence
    Used to formulate auditing at the occupancy-class level.
invented entities (1)
  • Occupancy Rashomon capacity no independent evidence
    purpose: Metric entropy of the near-optimal occupancy region relative to the audited deployment class
    Newly defined quantity that organizes the hardness result.

pith-pipeline@v0.9.1-grok · 5852 in / 1356 out tokens · 31245 ms · 2026-06-28T22:54:21.947178+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 1 canonical work pages

  1. [1]

    Poisoning deep reinforcement learning agents with in- distribution triggers, 2021

    Chace Ashcraft and Kiran Karra. Poisoning deep reinforcement learning agents with in- distribution triggers, 2021. URLhttps://arxiv.org/abs/2106.07798

  2. [2]

    Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199–231, 2001

    Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199–231, 2001

  3. [3]

    Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, and Dylan Hadfield-Menell

    Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J ´er´emy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Rapha ¨el Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, ...

  4. [4]

    Diversity is all you need: Learning skills without a reward function.arXiv preprint arXiv:1802.06070, 2018

    Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function.arXiv preprint arXiv:1802.06070, 2018

  5. [5]

    Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models si- multaneously.Journal of Machine Learning Research, 20(177):1–81, 2019

  6. [6]

    D4RL: Datasets for deep data-driven reinforcement learning, 2020

    Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4RL: Datasets for deep data-driven reinforcement learning, 2020. URLhttps://arxiv.org/abs/2004. 07219

  7. [7]

    Safe reinforcement learning via formal methods: Toward safe control through proof and learning

    Nathan Fulton and Andr ´e Platzer. Safe reinforcement learning via formal methods: Toward safe control through proof and learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

  8. [8]

    Cautious reinforce- ment learning with logical constraints

    Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. Cautious reinforce- ment learning with logical constraints. InProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 483–491, 2020

  9. [9]

    Rashomon capacity: A metric for predictive multiplicity in classification.Advances in Neural Information Processing Systems, 35:28988–29000, 2022

    Hsiang Hsu and Flavio Calmon. Rashomon capacity: A metric for predictive multiplicity in classification.Advances in Neural Information Processing Systems, 35:28988–29000, 2022

  10. [10]

    Trojdrl: evaluation of back- door attacks on deep reinforcement learning

    Panagiota Kiourti, Kacper Wardega, Susmit Jha, and Wenchao Li. Trojdrl: evaluation of back- door attacks on deep reinforcement learning. In2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2020. 9

  11. [11]

    One solution is not all you need: Few-shot extrapolation via structured maxent rl.Advances in Neural Information Processing Systems, 33:8198–8210, 2020

    Saurabh Kumar, Aviral Kumar, Sergey Levine, and Chelsea Finn. One solution is not all you need: Few-shot extrapolation via structured maxent rl.Advances in Neural Information Processing Systems, 33:8198–8210, 2020

  12. [12]

    Convergence of estimates under dimensionality restrictions.The Annals of Statistics, pages 38–53, 1973

    Lucien LeCam. Convergence of estimates under dimensionality restrictions.The Annals of Statistics, pages 38–53, 1973

  13. [13]

    Predictive multiplicity in classification

    Charles Marx, Flavio Calmon, and Berk Ustun. Predictive multiplicity in classification. In International conference on machine learning, pages 6765–6774. PMLR, 2020

  14. [14]

    An empirical evaluation of the rashomon effect in explainable machine learning

    Sebastian M ¨uller, Vanessa Toborek, Katharina Beckh, Matthias Jakobs, Christian Bauckhage, and Pascal Welke. An empirical evaluation of the rashomon effect in explainable machine learning. InJoint european conference on machine learning and knowledge discovery in databases, pages 462–478. Springer, 2023

  15. [15]

    The effects of reward misspecification: Mapping and mitigating misaligned models, 2022

    Alexander Pan, Kush Bhatia, and Jacob Steinhardt. The effects of reward misspecification: Mapping and mitigating misaligned models, 2022. URLhttps://arxiv.org/abs/2201. 03544

  16. [16]

    Stable-Baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research, 22(268):1–8, 2021

    Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-Baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research, 22(268):1–8, 2021. URLhttps://jmlr.org/papers/v22/ 20-1364.html

  17. [17]

    Policy teach- ing via environment poisoning: Training-time adversarial attacks against reinforcement learn- ing

    Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, and Adish Singla. Policy teach- ing via environment poisoning: Training-time adversarial attacks against reinforcement learn- ing. InInternational Conference on Machine Learning, pages 7974–7984. PMLR, 2020

  18. [18]

    Interpretable machine learning: Fundamental principles and 10 grand challenges.Statistic Surveys, 16:1–85, 2022

    Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. Interpretable machine learning: Fundamental principles and 10 grand challenges.Statistic Surveys, 16:1–85, 2022

  19. [19]

    On the existence of simpler machine learn- ing models

    Lesia Semenova, Cynthia Rudin, and Ronald Parr. On the existence of simpler machine learn- ing models. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1827–1858, 2022

  20. [20]

    Defining and char- acterizing reward gaming.Advances in Neural Information Processing Systems, 35:9460– 9471, 2022

    Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and char- acterizing reward gaming.Advances in Neural Information Processing Systems, 35:9460– 9471, 2022

  21. [21]

    Tsybakov.Introduction to Nonparametric Estimation

    Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statis- tics. Springer, New York, NY , 2009. doi: 10.1007/b13794

  22. [22]

    Backdoorl: Backdoor attack against competitive reinforcement learning.arXiv preprint arXiv:2105.00579, 2021

    Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, and Dawn Song. Backdoorl: Backdoor attack against competitive reinforcement learning.arXiv preprint arXiv:2105.00579, 2021

  23. [23]

    Should tables be sorted?Journal of the ACM (JACM), 28(3):615–628, 1981

    Andrew Chi-Chih Yao. Should tables be sorted?Journal of the ACM (JACM), 28(3):615–628, 1981

  24. [24]

    Probabilistic computations: Toward a unified measure of complexity

    Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227. IEEE, 1977

  25. [25]

    MinAtar: An Atari-inspired testbed for thorough and repro- ducible reinforcement learning experiments, 2019

    Kenny Young and Tian Tian. MinAtar: An Atari-inspired testbed for thorough and repro- ducible reinforcement learning experiments, 2019. URLhttps://arxiv.org/abs/1903. 03176

  26. [26]

    Right 5 then Down5

    Bin Yu. Assouad, fano, and le cam. InFestschrift for Lucien Le Cam: research papers in probability and statistics, pages 423–435. Springer, 1997. 10 A Additional Proof Details A.1 Yao reduction for Theorem 6 Theorem 6 is stated for deterministic auditors under a uniform prior. If a randomized auditor suc- ceeded with fewer queries against every index, the...