Auditing Near-Optimal Policies Can Be Exponentially Hard: Conditional Query Lower Bounds via Occupancy Rashomon Capacity
Pith reviewed 2026-06-28 22:54 UTC · model grok-4.3
The pith
When the audited class contains a high-capacity packing of near-optimal policies with sparse signatures, exact local-query auditing requires exponentially many queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If the audited deployment class contains a 2/H-separated near-optimal packing whose local signatures are b-sparse and that realizes the deployment-class capacity, then exact local-query auditing requires Omega(M/b) queries; when the packing realizes capacity and b is constant this becomes Omega(2 to the power of Hopt^F(eps)). A finite discounted hidden-branch MDP attains the bound, and a mixture lower bound of order M/beta holds for noisy hidden-trigger testing.
What carries the argument
Occupancy Rashomon capacity: the metric entropy of the near-optimal occupancy region computed relative to the audited deployment class, which counts the number of occupancy-distinct near-optimal policies.
If this is right
- Exact local-query auditing requires Omega(M/b) queries under the stated packing condition.
- Noisy hidden-trigger testing requires Omega(M/beta) samples where beta is the per-sample KL signal.
- The audited capacity collapses under a canonical occupancy regularizer when a trusted reference occupancy is supplied.
- A transcript-compatible oracle-cover verification upper bound exists alongside the lower bounds.
Where Pith is reading between the lines
- Auditors may need to switch to noisy or reference-assisted methods in high-capacity environments rather than exact local queries.
- Designing or selecting deployment classes with low occupancy Rashomon capacity could make auditing tractable in practice.
- The hardness result may extend to continuous or visual control tasks where post-processed policies exhibit similar occupancy diversity.
Load-bearing premise
The audited deployment class contains a 2/H-separated near-optimal packing with b-sparse local signatures that realizes the deployment-class capacity.
What would settle it
An audited deployment class that contains many return-equivalent policies yet has low metric entropy in occupancy space, so that exact auditing succeeds with o(2 to the power of capacity) local queries.
Figures
read the original abstract
When many reinforcement-learning policies achieve near-optimal return, a post-hoc auditor may have to distinguish among many behaviorally distinct but return-equivalent policies. We formalize this phenomenon through an occupancy-measure analogue of Rashomon capacity: the metric entropy of the near-optimal occupancy region, computed relative to an audited deployment class. Because occupancy measures identify behavior only up to occupancy equivalence, we formulate auditing at the occupancy-class level and distinguish exact local-query oracles from noisy sample-query oracles. Our main exact-query result is conditional: if the audited class contains a $2/H$-separated near-optimal packing whose local signatures are $b$-sparse, then exact local-query auditing requires $\Omega(M/b)$ queries; when the packing realizes deployment-class capacity and $b=O(1)$, this becomes $\Omega(2^{\Hopt^\cF(\eps)})$. We give a finite discounted hidden-branch MDP attaining this bound and show the exact Bayes success law. For noisy hidden-trigger testing, we prove a mixture lower bound of order $M/\beta$, where $\beta$ is the per-sample KL signal, yielding $\Omega(2^{\Hopt^\cF(\eps)}/(\rho^2\Delta^2))$ for capacity-order packings with $\beta=O(\rho^2\Delta^2)$. We also provide a static target-recognition information lower bound, a transcript-compatible oracle-cover verification upper bound, and a canonical occupancy regularizer whose regularized audited capacity collapses when a trusted reference occupancy is available. Controlled benchmarks distinguish positive sparse-signature instances from high-capacity negative controls where exact auditing is easy, and map the noisy-trigger law to post-processed continuous-control and visual-RL auditing regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines occupancy Rashomon capacity as the metric entropy of the near-optimal occupancy region relative to an audited deployment class. It proves a conditional exact-query lower bound: if the audited class contains a 2/H-separated near-optimal packing with b-sparse local signatures realizing deployment-class capacity, then exact local-query auditing requires Ω(M/b) queries, which becomes Ω(2^{Hopt^F(ε)}) for b=O(1). The paper supplies a finite discounted hidden-branch MDP attaining the bound, an exact Bayes success law, a mixture lower bound Ω(M/β) for noisy hidden-trigger testing (yielding Ω(2^{Hopt^F(ε)}/(ρ²Δ²)) for capacity-order packings), a static target-recognition information lower bound, a transcript-compatible oracle-cover verification upper bound, a canonical occupancy regularizer, and controlled benchmarks separating sparse-signature positive instances from high-capacity negative controls.
Significance. If the conditional premise holds, the results establish that post-hoc auditing of near-optimal policies can be exponentially hard in the effective horizon or capacity when the deployment class admits a suitable packing. The explicit finite-MDP construction attaining the bound, the exact Bayes success law, the mixture lower bound for the noisy case, and the positive/negative benchmark controls are strengths that make the conditional claim falsifiable and technically grounded. The distinction between exact local-query and noisy sample-query oracles, together with the occupancy-class formulation, clarifies the setting in which the hardness applies.
minor comments (3)
- [§2] The notation Hopt^F(ε) and the precise definition of the deployment-class capacity (relative to the audited class) should be stated in a single displayed equation early in §2 or §3 so that the conditional premise can be checked without cross-referencing the abstract.
- [Benchmarks] In the benchmark section, the high-capacity negative-control instances should report the realized occupancy Rashomon capacity value alongside the query count to make the separation from the sparse-signature positive instances quantitative.
- [Upper bound] The transcript-compatible oracle-cover verification upper bound is stated without an explicit dependence on the packing parameters; adding a short remark on how the upper bound scales with M and b would clarify its relation to the Ω(M/b) lower bound.
Simulated Author's Rebuttal
We thank the referee for the detailed and accurate summary of the manuscript, the positive assessment of its significance, and the recommendation for minor revision. No major comments were provided in the report.
Circularity Check
No significant circularity identified
full rationale
The paper states conditional lower bounds on exact local-query auditing that hold only if the audited deployment class contains a 2/H-separated near-optimal packing with b-sparse local signatures realizing deployment-class capacity. It supplies an explicit finite discounted hidden-branch MDP construction attaining the bound, derives an exact Bayes success law, a mixture lower bound for the noisy case, and provides controlled benchmarks separating positive and negative instances. No load-bearing step reduces by definition, by renaming a fitted parameter as a prediction, or via a self-citation chain; all central claims rest on independent information-theoretic and query-complexity arguments plus the supplied construction. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Finite discounted hidden-branch MDP
- standard math Occupancy measures identify behavior up to equivalence
invented entities (1)
-
Occupancy Rashomon capacity
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Poisoning deep reinforcement learning agents with in- distribution triggers, 2021
Chace Ashcraft and Kiran Karra. Poisoning deep reinforcement learning agents with in- distribution triggers, 2021. URLhttps://arxiv.org/abs/2106.07798
arXiv 2021
-
[2]
Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199–231, 2001
Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199–231, 2001
2001
-
[3]
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J ´er´emy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Rapha ¨el Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, ...
Pith/arXiv arXiv 2023
-
[4]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function.arXiv preprint arXiv:1802.06070, 2018
Pith/arXiv arXiv 2018
-
[5]
Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models si- multaneously.Journal of Machine Learning Research, 20(177):1–81, 2019
2019
-
[6]
D4RL: Datasets for deep data-driven reinforcement learning, 2020
Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4RL: Datasets for deep data-driven reinforcement learning, 2020. URLhttps://arxiv.org/abs/2004. 07219
2020
-
[7]
Safe reinforcement learning via formal methods: Toward safe control through proof and learning
Nathan Fulton and Andr ´e Platzer. Safe reinforcement learning via formal methods: Toward safe control through proof and learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018
2018
-
[8]
Cautious reinforce- ment learning with logical constraints
Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. Cautious reinforce- ment learning with logical constraints. InProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 483–491, 2020
2020
-
[9]
Rashomon capacity: A metric for predictive multiplicity in classification.Advances in Neural Information Processing Systems, 35:28988–29000, 2022
Hsiang Hsu and Flavio Calmon. Rashomon capacity: A metric for predictive multiplicity in classification.Advances in Neural Information Processing Systems, 35:28988–29000, 2022
2022
-
[10]
Trojdrl: evaluation of back- door attacks on deep reinforcement learning
Panagiota Kiourti, Kacper Wardega, Susmit Jha, and Wenchao Li. Trojdrl: evaluation of back- door attacks on deep reinforcement learning. In2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2020. 9
2020
-
[11]
One solution is not all you need: Few-shot extrapolation via structured maxent rl.Advances in Neural Information Processing Systems, 33:8198–8210, 2020
Saurabh Kumar, Aviral Kumar, Sergey Levine, and Chelsea Finn. One solution is not all you need: Few-shot extrapolation via structured maxent rl.Advances in Neural Information Processing Systems, 33:8198–8210, 2020
2020
-
[12]
Convergence of estimates under dimensionality restrictions.The Annals of Statistics, pages 38–53, 1973
Lucien LeCam. Convergence of estimates under dimensionality restrictions.The Annals of Statistics, pages 38–53, 1973
1973
-
[13]
Predictive multiplicity in classification
Charles Marx, Flavio Calmon, and Berk Ustun. Predictive multiplicity in classification. In International conference on machine learning, pages 6765–6774. PMLR, 2020
2020
-
[14]
An empirical evaluation of the rashomon effect in explainable machine learning
Sebastian M ¨uller, Vanessa Toborek, Katharina Beckh, Matthias Jakobs, Christian Bauckhage, and Pascal Welke. An empirical evaluation of the rashomon effect in explainable machine learning. InJoint european conference on machine learning and knowledge discovery in databases, pages 462–478. Springer, 2023
2023
-
[15]
The effects of reward misspecification: Mapping and mitigating misaligned models, 2022
Alexander Pan, Kush Bhatia, and Jacob Steinhardt. The effects of reward misspecification: Mapping and mitigating misaligned models, 2022. URLhttps://arxiv.org/abs/2201. 03544
2022
-
[16]
Stable-Baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research, 22(268):1–8, 2021
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-Baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research, 22(268):1–8, 2021. URLhttps://jmlr.org/papers/v22/ 20-1364.html
2021
-
[17]
Policy teach- ing via environment poisoning: Training-time adversarial attacks against reinforcement learn- ing
Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, and Adish Singla. Policy teach- ing via environment poisoning: Training-time adversarial attacks against reinforcement learn- ing. InInternational Conference on Machine Learning, pages 7974–7984. PMLR, 2020
2020
-
[18]
Interpretable machine learning: Fundamental principles and 10 grand challenges.Statistic Surveys, 16:1–85, 2022
Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. Interpretable machine learning: Fundamental principles and 10 grand challenges.Statistic Surveys, 16:1–85, 2022
2022
-
[19]
On the existence of simpler machine learn- ing models
Lesia Semenova, Cynthia Rudin, and Ronald Parr. On the existence of simpler machine learn- ing models. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1827–1858, 2022
2022
-
[20]
Defining and char- acterizing reward gaming.Advances in Neural Information Processing Systems, 35:9460– 9471, 2022
Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and char- acterizing reward gaming.Advances in Neural Information Processing Systems, 35:9460– 9471, 2022
2022
-
[21]
Tsybakov.Introduction to Nonparametric Estimation
Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statis- tics. Springer, New York, NY , 2009. doi: 10.1007/b13794
-
[22]
Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, and Dawn Song. Backdoorl: Backdoor attack against competitive reinforcement learning.arXiv preprint arXiv:2105.00579, 2021
arXiv 2021
-
[23]
Should tables be sorted?Journal of the ACM (JACM), 28(3):615–628, 1981
Andrew Chi-Chih Yao. Should tables be sorted?Journal of the ACM (JACM), 28(3):615–628, 1981
1981
-
[24]
Probabilistic computations: Toward a unified measure of complexity
Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227. IEEE, 1977
1977
-
[25]
MinAtar: An Atari-inspired testbed for thorough and repro- ducible reinforcement learning experiments, 2019
Kenny Young and Tian Tian. MinAtar: An Atari-inspired testbed for thorough and repro- ducible reinforcement learning experiments, 2019. URLhttps://arxiv.org/abs/1903. 03176
2019
-
[26]
Right 5 then Down5
Bin Yu. Assouad, fano, and le cam. InFestschrift for Lucien Le Cam: research papers in probability and statistics, pages 423–435. Springer, 1997. 10 A Additional Proof Details A.1 Yao reduction for Theorem 6 Theorem 6 is stated for deterministic auditors under a uniform prior. If a randomized auditor suc- ceeded with fewer queries against every index, the...
1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.