The Security Budget of Code-LLM Prompt Hardening: Provable Limits Under Pass-Only Acceptance
Pith reviewed 2026-06-28 09:41 UTC · model grok-4.3
The pith
Any deterministic prompt filter for code LLMs leaks at least 0.84 nats of task information under pass-only acceptance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any deterministic prompt filter h and registered family of finite executable-equivalence task variables Y_exec, the shared filtered-prompt channel I(h(p);h(tilde p)) is lower-bounded by a worst-Y Fano floor. On HumanEval and MBPP the universal pass-only floor evaluates to F^op >= 0.84 and 1.20 nats at eta=0.05 task-collapse tolerance, and the identity row realizes F^id >= 1.67 and 1.80 nats. An estimator-invariance corollary lifts the floor to any deterministic embedding pipeline; a dataset-agnostic corollary states the floor in visible-spec entropy and is empirically witnessed by 164/164 HumanEval+ and 224/224 MBPP+ V(p)-invariance.
What carries the argument
The worst-Y Fano floor applied to the mutual information of the filtered-prompt channel, which supplies the information-theoretic lower bound once pass-only acceptance is required.
If this is right
- The floor applies unchanged to any deterministic embedding pipeline by the estimator-invariance corollary.
- The floor can be restated in visible-spec entropy alone and is witnessed by invariance in all 388 benchmark cases.
- The Tri-Audit Protocol separates a prompt-side Shannon-nats registry attribute from a model-side KSG/MINE proxy on hidden states.
- In a constrained search over deterministic and guarded learned filters on CodeLlama-7B, Qwen2.5-Coder-7B/1.5B and DeepSeek-Coder-6.7B, twenty-eight pass-preserving rows all fail proxy-axis leakage reduction.
- Pass@1 alone cannot certify code-LLM prompt hardening because the information floor remains even when acceptance is satisfied.
Where Pith is reading between the lines
- Security evaluations of code-LLM prompt hardening must incorporate this information-theoretic floor rather than relying only on empirical pass rates.
- The same style of Fano argument could be applied to other functional-equivalence acceptance criteria outside code generation.
- Achieving leakage below the floor may require non-deterministic or multi-stage filters that the current analysis excludes.
- The complete invariance observed across 388 cases suggests the derived floor is close to tight for these particular benchmarks.
Load-bearing premise
The registered family of task variables consists of finite executable-equivalence classes whose collapse tolerance can be set at eta=0.05 without invalidating the Fano bound for arbitrary deterministic filters.
What would settle it
A single deterministic filter h that maintains pass-only acceptance at eta=0.05 yet produces measured mutual information I(h(p);h(tilde p)) below the stated Fano floor on the HumanEval or MBPP task families.
Figures
read the original abstract
We give a quantitative impossibility result for pass-only prompt hardening of code LLMs. For any deterministic prompt filter $h$ and a registered family of finite executable-equivalence task variables $\mathcal Y_{\mathrm{exec}}$, the shared filtered-prompt channel $\rmI(h(p);h(\tilde p))$ is lower-bounded by a worst-$Y$ Fano floor; on HumanEval and MBPP the universal pass-only floor evaluates to $\mathcal F^{\mathrm{op}}\ge 0.84$ and $1.20$ nats at $\eta=0.05$ task-collapse tolerance, and the identity row realizes $\mathcal F^{\mathrm{id}}\ge 1.67$ and $1.80$ nats. An estimator-invariance corollary lifts the floor to any deterministic embedding pipeline; a dataset-agnostic corollary states the floor in visible-spec entropy and is empirically witnessed by $164/164$ HumanEval+ and $224/224$ MBPP+ $V(p)$-invariance. We operationalize the floor as the \emph{Tri-Audit Protocol}, a two-axis reporting protocol that separates a prompt-side deductive registry attribute (Shannon nats on the visible-spec representation) from a model-side empirical proxy (KSG-1 primary, MINE secondary, on hidden states). A constrained best-of-family search over deterministic and guarded learned filters on CodeLlama-7B, Qwen2.5-Coder-7B/1.5B and DeepSeek-Coder-6.7B at $n=164$ yields the \emph{Cross-Model Tri-Audit Invariance}: of twenty-eight pass-preserving rows, twelve antecedent-preserving deterministic rows fail proxy-axis leakage reduction on every backbone with sign-invariant positive deviations, twelve antecedent-changed-of-record learned-canonicalizer rows fail proxy-axis leakage on every backbone, and four antecedent-violating rows are reported as registered-family collapse; no filter produces a shared Tri-pass on a nine-cell gate-sensitivity sweep. Pass@1 alone cannot certify code-LLM prompt hardening.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a quantitative impossibility result for pass-only prompt hardening in code LLMs. It claims that for any deterministic filter h and registered family of finite executable-equivalence task variables Y_exec, the mutual information I(h(p); h(tilde p)) is lower-bounded by a worst-Y Fano floor. On HumanEval and MBPP this yields universal floors of at least 0.84 and 1.20 nats (eta=0.05), with identity rows at 1.67 and 1.80 nats. Corollaries extend the bound to any deterministic embedding and to visible-spec entropy (witnessed by 164/164 and 224/224 invariance cases). The Tri-Audit Protocol is introduced to separate prompt-side Shannon entropy from model-side proxy leakage (KSG-1/MINE). Empirical search over deterministic and learned filters on CodeLlama-7B, Qwen2.5-Coder, and DeepSeek-Coder shows no filter achieves leakage reduction on all axes without antecedent violation or collapse, implying Pass@1 alone cannot certify hardening.
Significance. If the Fano derivation and invariance claims hold, the work supplies a concrete information-theoretic limit on prompt-side defenses for code LLMs and introduces a two-axis audit protocol that could become a reporting standard. Credit is due for the dataset-agnostic corollary, the multi-model empirical sweep (n=164), and the explicit separation of deductive registry attributes from empirical proxies. The result would be of interest to the security and LLM evaluation communities even if the numerical floors require refinement.
major comments (3)
- [Abstract] Abstract (opening quantitative impossibility result): the claim that the min_Y Fano floor remains a uniform lower bound on I(h(p);h(tilde p)) for arbitrary deterministic h after eta=0.05 collapse requires explicit proof that the collapse partition is fixed before h is chosen; if collapse can be chosen after h, the worst-Y expression becomes h-dependent and the universality statement does not follow.
- [Abstract] Abstract (empirical invariance claims): the statements '164/164 HumanEval+ and 224/224 MBPP+ V(p)-invariance' are presented without visible selection criteria or exclusion rules for the task variables; this directly affects whether the dataset-agnostic corollary is supported by the reported counts.
- [Abstract] Abstract (Tri-Audit Protocol description): the protocol is said to separate 'prompt-side deductive registry attribute (Shannon nats on the visible-spec representation)' from 'model-side empirical proxy'; the manuscript must show that the visible-spec entropy term is independent of the choice of estimator (KSG-1 vs MINE) used on the hidden-state side, otherwise the claimed separation is not guaranteed.
minor comments (1)
- [Abstract] Abstract contains LaTeX artifacts (\rmI, \tilde p) that should be rendered consistently in the final version.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each of the three major points below with clarifications on the quantification order, dataset criteria, and protocol separation. Revisions will be made to improve explicitness in the abstract and methods.
read point-by-point responses
-
Referee: [Abstract] Abstract (opening quantitative impossibility result): the claim that the min_Y Fano floor remains a uniform lower bound on I(h(p);h(tilde p)) for arbitrary deterministic h after eta=0.05 collapse requires explicit proof that the collapse partition is fixed before h is chosen; if collapse can be chosen after h, the worst-Y expression becomes h-dependent and the universality statement does not follow.
Authors: The family Υ_exec and its η-collapse partition are defined from the executable-equivalence relation on the registered task variables, which is fixed independently of any choice of deterministic filter h. The min_Y is taken over this fixed family, so the resulting Fano floor is uniform and does not depend on h. We will revise the abstract to state the quantification order explicitly: the family and collapse precede the choice of h. revision: yes
-
Referee: [Abstract] Abstract (empirical invariance claims): the statements '164/164 HumanEval+ and 224/224 MBPP+ V(p)-invariance' are presented without visible selection criteria or exclusion rules for the task variables; this directly affects whether the dataset-agnostic corollary is supported by the reported counts.
Authors: The counts 164 and 224 are the sizes of the full HumanEval+ and MBPP+ suites after the standard executable-task filter (tasks possessing at least one passing test case in the original benchmark release). No additional exclusion rules were applied. We will insert this selection criterion into the abstract and the dataset-agnostic corollary statement. revision: yes
-
Referee: [Abstract] Abstract (Tri-Audit Protocol description): the protocol is said to separate 'prompt-side deductive registry attribute (Shannon nats on the visible-spec representation)' from 'model-side empirical proxy'; the manuscript must show that the visible-spec entropy term is independent of the choice of estimator (KSG-1 vs MINE) used on the hidden-state side, otherwise the claimed separation is not guaranteed.
Authors: The visible-spec entropy is the ordinary Shannon entropy of the discrete distribution over the visible-spec representation V(p); it is computed directly from the finite support and does not invoke any continuous estimator. KSG-1 and MINE are used only for the model-side proxy on hidden-state representations. We will add an explicit sentence in the Tri-Audit Protocol description confirming this independence. revision: yes
Circularity Check
No significant circularity; derivation applies standard Fano inequality to external family.
full rationale
The central claim applies Fano's inequality (a standard information-theoretic result) to produce a worst-Y lower bound on I(h(p);h(tilde p)) for arbitrary deterministic h, given a pre-registered finite family Y_exec. Numerical values (0.84, 1.20 nats) are evaluations of that bound on benchmark data after the derivation, not inputs that define the bound by construction. The eta=0.05 tolerance is a fixed modeling choice for the family, not a fitted parameter renamed as a prediction. Corollaries on estimator invariance and visible-spec entropy follow from the main result without reducing to self-citation chains or ansatz smuggling. No equations or steps in the abstract reduce the claimed floor to the paper's own fitted quantities or prior self-citations. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- eta task-collapse tolerance
axioms (1)
- standard math Fano's inequality provides a valid lower bound on mutual information for the worst-case task variable in the registered family
Reference graph
Works this paper leans on
-
[1]
IEEE Information Theory Workshop (ITW) , year =
Tishby, Naftali and Zaslavsky, Noga , title =. IEEE Information Theory Workshop (ITW) , year =
-
[2]
Tishby, Naftali and Pereira, Fernando C. and Bialek, William , title =. arXiv preprint physics/0004057 , year =
-
[3]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =
Achille, Alessandro and Soatto, Stefano , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2018 , note =
2018
-
[4]
Devon and Fedorov, Alex and Lavoie-Marchildon, Samuel and others , title =
Hjelm, R. Devon and Fedorov, Alex and Lavoie-Marchildon, Samuel and others , title =. International Conference on Learning Representations (ICLR) , year =
-
[5]
International Conference on Machine Learning (ICML) , year =
Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeswar, Sai and others , title =. International Conference on Machine Learning (ICML) , year =
-
[6]
Estimating Mutual Information , journal =
Kraskov, Alexander and St. Estimating Mutual Information , journal =
-
[7]
arXiv preprint arXiv:1807.03748 , year =
van den Oord, Aaron and Li, Yazhe and Vinyals, Oriol , title =. arXiv preprint arXiv:1807.03748 , year =
-
[8]
and Thomas, Joy A
Cover, Thomas M. and Thomas, Joy A. , title =
-
[9]
arXiv preprint arXiv:2107.03374 , year =
Chen, Mark and Tworek, Jerry and Jun, Heewoo and others , title =. arXiv preprint arXiv:2107.03374 , year =
-
[10]
Code Llama: Open Foundation Models for Code , journal =
Rozi. Code Llama: Open Foundation Models for Code , journal =
-
[11]
arXiv preprint arXiv:2409.12186 , year =
Hui, Binyuan and Yang, Jian and Cui, Zeyu and others , title =. arXiv preprint arXiv:2409.12186 , year =
-
[12]
arXiv preprint arXiv:2401.14196 , year =
Guo, Daya and Zhu, Qihao and Yang, Dejian and others , title =. arXiv preprint arXiv:2401.14196 , year =
-
[13]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[14]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =
Zhou, Shuyan and Alon, Uri and Agarwal, Sumit and Neubig, Graham , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =
2023
-
[15]
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =
Wang, Shiqi and Li, Zheng and Qian, Haifeng and others , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =
-
[16]
arXiv preprint arXiv:2307.15043 , year =
Zou, Andy and Wang, Zifan and Carlini, Nicholas and others , title =. arXiv preprint arXiv:2307.15043 , year =
-
[17]
2022 IEEE Symposium on Security and Privacy (SP) , year =
Pearce, Hammond and Ahmad, Baleegh and Tan, Benjamin and others , title =. 2022 IEEE Symposium on Security and Privacy (SP) , year =
2022
-
[18]
Siddiq, Mohammed Latif and Santos, Joanna C. S. , title =. Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S) , year =
-
[19]
Proceedings of the 48th International Conference on Software Engineering (ICSE) , year =
Liu, Shuhan and Hu, Xing and Huang, Kerui and Yang, Xiaohu and Lo, David and Xia, Xin , title =. Proceedings of the 48th International Conference on Software Engineering (ICSE) , year =
-
[20]
and Su, Zhendong and others , title =
Hindle, Abram and Barr, Earl T. and Su, Zhendong and others , title =. Proceedings of the 34th International Conference on Software Engineering (ICSE) , year =
-
[21]
Proceedings of the 42nd International Conference on Software Engineering (ICSE) , year =
Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and others , title =. Proceedings of the 42nd International Conference on Software Engineering (ICSE) , year =
-
[22]
and Devanbu, Premkumar and Sutton, Charles , title =
Allamanis, Miltiadis and Barr, Earl T. and Devanbu, Premkumar and Sutton, Charles , title =. ACM Computing Surveys (CSUR) , volume =. 2018 , note =
2018
-
[23]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Kaneko, Masahiro and Baldwin, Timothy , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[24]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Xu, Aolin and Raginsky, Maxim , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[25]
arXiv preprint arXiv:2108.07732 , year =
Austin, Jacob and Odena, Augustus and Nye, Maxwell and others , title =. arXiv preprint arXiv:2108.07732 , year =
-
[26]
International Conference on Learning Representations (ICLR) , year =
Madry, Aleksander and Makelov, Aleksandar and Schmidt, Ludwig and Tsipras, Dimitris and Vladu, Adrian , title =. International Conference on Learning Representations (ICLR) , year =
-
[27]
Proceedings of the AAAI Conference on Artificial Intelligence , volume =
Zhong, Li and Wang, Zilong , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2024 , doi =
2024
-
[28]
Proceedings of the AAAI Conference on Artificial Intelligence , volume =
Tian, Yuchen and Yan, Weixiang and Yang, Qian and Zhao, Xuandong and Chen, Qian and Wang, Wen and Luo, Ziyang and Ma, Lei and Song, Dawn , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =
2025
-
[29]
Proceedings of the AAAI Conference on Artificial Intelligence , volume =
Lin, Leon and Brown, Hannah and Kawaguchi, Kenji and Shieh, Michael , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =
2025
-
[30]
arXiv preprint arXiv:2406.19783 , year =
Chen, Junkai and Li, Zhenhao and Hu, Xing and Xia, Xin , title =. arXiv preprint arXiv:2406.19783 , year =
-
[31]
Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages =
Li, Xueyang and Meng, Guozhu and Liu, Shangqing and Xiang, Lu and Sun, Kun and Chen, Kai and Luo, Xiapu and Liu, Yang , title =. Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages =. 2024 , doi =
2024
-
[32]
Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =
Zhang, Yuhao and Wang, Shiqi and Qian, Haifeng and Wang, Zijian and Shang, Mingyue and Liu, Linbo and Gouda, Sanjay Krishna and Ray, Baishakhi and Ramanathan, Murali Krishna and Ma, Xiaofei and Deoras, Anoop , title =. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =. 2024 , doi =
2024
-
[33]
International Conference on Learning Representations (ICLR) , pages =
Xu, Xilie and Kong, Keyi and Liu, Ning and Cui, Lizhen and Wang, Di and Zhang, Jingfeng and Kankanhalli, Mohan , title =. International Conference on Learning Representations (ICLR) , pages =
-
[34]
and Tram
Chao, Patrick and Debenedetti, Edoardo and Robey, Alexander and Andriushchenko, Maksym and Croce, Francesco and Sehwag, Vikash and Dobriban, Edgar and Flammarion, Nicolas and Pappas, George J. and Tram. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2024 , doi =
2024
-
[35]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Cao, Bowen and Cai, Deng and Zhang, Zhisong and Zou, Yuexian and Lam, Wai , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2024 , doi =
2024
-
[36]
, title =
He, Jingxuan and Vechev, Martin T. , title =. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS) , pages =. 2023 , doi =
2023
-
[37]
arXiv preprint arXiv:2602.18800 , year =
Paul, Debalina Ghosh and Zhu, Hong and Bayley, Ian , title =. arXiv preprint arXiv:2602.18800 , year =
- [38]
-
[39]
arXiv preprint arXiv:2511.16209 , year =
Jawad, Huseein and Brunel, Nicolas , title =. arXiv preprint arXiv:2511.16209 , year =
-
[40]
arXiv preprint arXiv:2509.21199 , year =
Gao, Lang and others , title =. arXiv preprint arXiv:2509.21199 , year =
-
[41]
arXiv preprint arXiv:2604.23887 , year =
Deep, Priyal and Emmons, Shane and Fox, Amy and Bacon, Kyle and McAllister, Kelley and Ortiz, Peter and Flautner, Krisztian , title =. arXiv preprint arXiv:2604.23887 , year =
-
[42]
arXiv preprint arXiv:2402.19173 , year =
Lozhkov, Anton and Li, Raymond and Allal, Loubna Ben and others , title =. arXiv preprint arXiv:2402.19173 , year =
-
[43]
International Conference on Machine Learning (ICML) , year =
Poole, Ben and Ozair, Sherjil and van den Oord, Aaron and Alemi, Alexander and Tucker, George , title =. International Conference on Machine Learning (ICML) , year =
-
[44]
International Conference on Learning Representations (ICLR) , year =
Song, Jiaming and Ermon, Stefano , title =. International Conference on Learning Representations (ICLR) , year =
-
[45]
International Conference on Artificial Intelligence and Statistics (AISTATS) , year =
McAllester, David and Stratos, Karl , title =. International Conference on Artificial Intelligence and Statistics (AISTATS) , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.