The Security Budget of Code-LLM Prompt Hardening: Provable Limits Under Pass-Only Acceptance

Jianwei Tai

arxiv: 2606.03308 · v3 · pith:VKVD5KJMnew · submitted 2026-06-02 · 💻 cs.CR

The Security Budget of Code-LLM Prompt Hardening: Provable Limits Under Pass-Only Acceptance

Jianwei Tai This is my paper

Pith reviewed 2026-06-28 09:41 UTC · model grok-4.3

classification 💻 cs.CR

keywords prompt hardeningcode LLMsmutual informationFano inequalitysecurity boundspass-only acceptanceHumanEvalMBPP

0 comments

The pith

Any deterministic prompt filter for code LLMs leaks at least 0.84 nats of task information under pass-only acceptance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves a quantitative lower bound on the mutual information between filtered versions of original and perturbed prompts for any deterministic filter h. The bound arises because pass-only acceptance requires the filter to preserve executability, which necessarily preserves enough task identity for a Fano inequality to apply over families of executable-equivalence task variables. On HumanEval the universal floor evaluates to at least 0.84 nats and on MBPP to at least 1.20 nats at five-percent task-collapse tolerance. Corollaries extend the same floor to arbitrary deterministic embedding pipelines and to visible-spec entropy, with the latter holding in every one of the 388 examined test cases. Empirical searches over both hand-crafted and learned filters on multiple code models confirm that no configuration reduces proxy-axis leakage while satisfying the pass-only constraint.

Core claim

For any deterministic prompt filter h and registered family of finite executable-equivalence task variables Y_exec, the shared filtered-prompt channel I(h(p);h(tilde p)) is lower-bounded by a worst-Y Fano floor. On HumanEval and MBPP the universal pass-only floor evaluates to F^op >= 0.84 and 1.20 nats at eta=0.05 task-collapse tolerance, and the identity row realizes F^id >= 1.67 and 1.80 nats. An estimator-invariance corollary lifts the floor to any deterministic embedding pipeline; a dataset-agnostic corollary states the floor in visible-spec entropy and is empirically witnessed by 164/164 HumanEval+ and 224/224 MBPP+ V(p)-invariance.

What carries the argument

The worst-Y Fano floor applied to the mutual information of the filtered-prompt channel, which supplies the information-theoretic lower bound once pass-only acceptance is required.

If this is right

The floor applies unchanged to any deterministic embedding pipeline by the estimator-invariance corollary.
The floor can be restated in visible-spec entropy alone and is witnessed by invariance in all 388 benchmark cases.
The Tri-Audit Protocol separates a prompt-side Shannon-nats registry attribute from a model-side KSG/MINE proxy on hidden states.
In a constrained search over deterministic and guarded learned filters on CodeLlama-7B, Qwen2.5-Coder-7B/1.5B and DeepSeek-Coder-6.7B, twenty-eight pass-preserving rows all fail proxy-axis leakage reduction.
Pass@1 alone cannot certify code-LLM prompt hardening because the information floor remains even when acceptance is satisfied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security evaluations of code-LLM prompt hardening must incorporate this information-theoretic floor rather than relying only on empirical pass rates.
The same style of Fano argument could be applied to other functional-equivalence acceptance criteria outside code generation.
Achieving leakage below the floor may require non-deterministic or multi-stage filters that the current analysis excludes.
The complete invariance observed across 388 cases suggests the derived floor is close to tight for these particular benchmarks.

Load-bearing premise

The registered family of task variables consists of finite executable-equivalence classes whose collapse tolerance can be set at eta=0.05 without invalidating the Fano bound for arbitrary deterministic filters.

What would settle it

A single deterministic filter h that maintains pass-only acceptance at eta=0.05 yet produces measured mutual information I(h(p);h(tilde p)) below the stated Fano floor on the HumanEval or MBPP task families.

Figures

Figures reproduced from arXiv: 2606.03308 by Jianwei Tai.

**Figure 1.** Figure 1: Deterministic prompt filters as a capacity–leakage Pareto audit on HumanEval. Each point is a filter from Table 4; [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗

**Figure 2.** Figure 2: Per-problem context-mixed alignment cosine stratified by unit-test pass@1 status for CodeLlama-HumanEval (CL [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

read the original abstract

We give a quantitative impossibility result for pass-only prompt hardening of code LLMs. For any deterministic prompt filter $h$ and a registered family of finite executable-equivalence task variables $\mathcal Y_{\mathrm{exec}}$, the shared filtered-prompt channel $\rmI(h(p);h(\tilde p))$ is lower-bounded by a worst-$Y$ Fano floor; on HumanEval and MBPP the universal pass-only floor evaluates to $\mathcal F^{\mathrm{op}}\ge 0.84$ and $1.20$ nats at $\eta=0.05$ task-collapse tolerance, and the identity row realizes $\mathcal F^{\mathrm{id}}\ge 1.67$ and $1.80$ nats. An estimator-invariance corollary lifts the floor to any deterministic embedding pipeline; a dataset-agnostic corollary states the floor in visible-spec entropy and is empirically witnessed by $164/164$ HumanEval+ and $224/224$ MBPP+ $V(p)$-invariance. We operationalize the floor as the \emph{Tri-Audit Protocol}, a two-axis reporting protocol that separates a prompt-side deductive registry attribute (Shannon nats on the visible-spec representation) from a model-side empirical proxy (KSG-1 primary, MINE secondary, on hidden states). A constrained best-of-family search over deterministic and guarded learned filters on CodeLlama-7B, Qwen2.5-Coder-7B/1.5B and DeepSeek-Coder-6.7B at $n=164$ yields the \emph{Cross-Model Tri-Audit Invariance}: of twenty-eight pass-preserving rows, twelve antecedent-preserving deterministic rows fail proxy-axis leakage reduction on every backbone with sign-invariant positive deviations, twelve antecedent-changed-of-record learned-canonicalizer rows fail proxy-axis leakage on every backbone, and four antecedent-violating rows are reported as registered-family collapse; no filter produces a shared Tri-pass on a nine-cell gate-sensitivity sweep. Pass@1 alone cannot certify code-LLM prompt hardening.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies Fano to derive concrete lower bounds on mutual information for any deterministic prompt filter in code LLMs, with reported floors of 0.84 and 1.20 nats, but the uniformity of that bound after eta collapse needs direct verification from the derivation.

read the letter

The main thing here is a claimed impossibility result: for any deterministic prompt filter h, the mutual information I(h(p); h(tilde p)) cannot drop below a worst-case Fano floor taken over a registered family of executable-equivalence task variables. On HumanEval and MBPP the numbers come out to at least 0.84 and 1.20 nats at eta=0.05 collapse tolerance, with higher values for the identity map. The paper turns this into a Tri-Audit protocol that pairs a deductive visible-spec entropy term with empirical mutual-information proxies on hidden states, then checks a set of deterministic and learned filters across CodeLlama-7B, Qwen2.5-Coder variants, and DeepSeek-Coder.

What is actually new is the concrete numerical application of Fano to pass-only hardening together with the two corollaries (estimator invariance and dataset-agnostic form) and the cross-model invariance claim that none of the twenty-eight pass-preserving rows reduced proxy-axis leakage on every backbone. The empirical side reports full coverage (164/164 and 224/224) on the augmented benchmarks, which is straightforward to check.

The derivation itself looks like standard information theory once the family Y_exec is fixed, and the protocol is a reasonable way to separate the two axes. That part is useful for anyone who wants a reporting template rather than another attack paper.

The soft spot is exactly the one in the stress-test note. The bound is presented as universal over arbitrary deterministic h, yet it depends on a min over Y after eta=0.05 collapse. If the collapse step can be taken after h is chosen, or if different h induce different effective partitions, the min_Y expression is no longer guaranteed to lower-bound every possible h. The abstract states the floor comes directly from the registered family at fixed eta, but does not make the ordering explicit. That needs to be walked through in the proof; without it the universality claim is not yet secured. The choice of eta itself is a free parameter, and the KSG-1 / MINE estimators are known to have their own biases, so the numerical floors are best treated as indicative rather than definitive.

This is for readers who work on LLM security for code and want information-theoretic constraints rather than purely empirical results. It is worth a serious referee because the core modeling choice is coherent and the empirical sweep is broad enough to be informative even if the bound requires tightening.

Referee Report

3 major / 1 minor

Summary. The paper presents a quantitative impossibility result for pass-only prompt hardening in code LLMs. It claims that for any deterministic filter h and registered family of finite executable-equivalence task variables Y_exec, the mutual information I(h(p); h(tilde p)) is lower-bounded by a worst-Y Fano floor. On HumanEval and MBPP this yields universal floors of at least 0.84 and 1.20 nats (eta=0.05), with identity rows at 1.67 and 1.80 nats. Corollaries extend the bound to any deterministic embedding and to visible-spec entropy (witnessed by 164/164 and 224/224 invariance cases). The Tri-Audit Protocol is introduced to separate prompt-side Shannon entropy from model-side proxy leakage (KSG-1/MINE). Empirical search over deterministic and learned filters on CodeLlama-7B, Qwen2.5-Coder, and DeepSeek-Coder shows no filter achieves leakage reduction on all axes without antecedent violation or collapse, implying Pass@1 alone cannot certify hardening.

Significance. If the Fano derivation and invariance claims hold, the work supplies a concrete information-theoretic limit on prompt-side defenses for code LLMs and introduces a two-axis audit protocol that could become a reporting standard. Credit is due for the dataset-agnostic corollary, the multi-model empirical sweep (n=164), and the explicit separation of deductive registry attributes from empirical proxies. The result would be of interest to the security and LLM evaluation communities even if the numerical floors require refinement.

major comments (3)

[Abstract] Abstract (opening quantitative impossibility result): the claim that the min_Y Fano floor remains a uniform lower bound on I(h(p);h(tilde p)) for arbitrary deterministic h after eta=0.05 collapse requires explicit proof that the collapse partition is fixed before h is chosen; if collapse can be chosen after h, the worst-Y expression becomes h-dependent and the universality statement does not follow.
[Abstract] Abstract (empirical invariance claims): the statements '164/164 HumanEval+ and 224/224 MBPP+ V(p)-invariance' are presented without visible selection criteria or exclusion rules for the task variables; this directly affects whether the dataset-agnostic corollary is supported by the reported counts.
[Abstract] Abstract (Tri-Audit Protocol description): the protocol is said to separate 'prompt-side deductive registry attribute (Shannon nats on the visible-spec representation)' from 'model-side empirical proxy'; the manuscript must show that the visible-spec entropy term is independent of the choice of estimator (KSG-1 vs MINE) used on the hidden-state side, otherwise the claimed separation is not guaranteed.

minor comments (1)

[Abstract] Abstract contains LaTeX artifacts (\rmI, \tilde p) that should be rendered consistently in the final version.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each of the three major points below with clarifications on the quantification order, dataset criteria, and protocol separation. Revisions will be made to improve explicitness in the abstract and methods.

read point-by-point responses

Referee: [Abstract] Abstract (opening quantitative impossibility result): the claim that the min_Y Fano floor remains a uniform lower bound on I(h(p);h(tilde p)) for arbitrary deterministic h after eta=0.05 collapse requires explicit proof that the collapse partition is fixed before h is chosen; if collapse can be chosen after h, the worst-Y expression becomes h-dependent and the universality statement does not follow.

Authors: The family Υ_exec and its η-collapse partition are defined from the executable-equivalence relation on the registered task variables, which is fixed independently of any choice of deterministic filter h. The min_Y is taken over this fixed family, so the resulting Fano floor is uniform and does not depend on h. We will revise the abstract to state the quantification order explicitly: the family and collapse precede the choice of h. revision: yes
Referee: [Abstract] Abstract (empirical invariance claims): the statements '164/164 HumanEval+ and 224/224 MBPP+ V(p)-invariance' are presented without visible selection criteria or exclusion rules for the task variables; this directly affects whether the dataset-agnostic corollary is supported by the reported counts.

Authors: The counts 164 and 224 are the sizes of the full HumanEval+ and MBPP+ suites after the standard executable-task filter (tasks possessing at least one passing test case in the original benchmark release). No additional exclusion rules were applied. We will insert this selection criterion into the abstract and the dataset-agnostic corollary statement. revision: yes
Referee: [Abstract] Abstract (Tri-Audit Protocol description): the protocol is said to separate 'prompt-side deductive registry attribute (Shannon nats on the visible-spec representation)' from 'model-side empirical proxy'; the manuscript must show that the visible-spec entropy term is independent of the choice of estimator (KSG-1 vs MINE) used on the hidden-state side, otherwise the claimed separation is not guaranteed.

Authors: The visible-spec entropy is the ordinary Shannon entropy of the discrete distribution over the visible-spec representation V(p); it is computed directly from the finite support and does not invoke any continuous estimator. KSG-1 and MINE are used only for the model-side proxy on hidden-state representations. We will add an explicit sentence in the Tri-Audit Protocol description confirming this independence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies standard Fano inequality to external family.

full rationale

The central claim applies Fano's inequality (a standard information-theoretic result) to produce a worst-Y lower bound on I(h(p);h(tilde p)) for arbitrary deterministic h, given a pre-registered finite family Y_exec. Numerical values (0.84, 1.20 nats) are evaluations of that bound on benchmark data after the derivation, not inputs that define the bound by construction. The eta=0.05 tolerance is a fixed modeling choice for the family, not a fitted parameter renamed as a prediction. Corollaries on estimator invariance and visible-spec entropy follow from the main result without reducing to self-citation chains or ansatz smuggling. No equations or steps in the abstract reduce the claimed floor to the paper's own fitted quantities or prior self-citations. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility; the result rests on standard information-theoretic axioms plus benchmark-specific calculations whose free parameters are not fully enumerated.

free parameters (1)

eta task-collapse tolerance
Tolerance parameter used to define the numerical Fano floors; set to 0.05 in the reported values.

axioms (1)

standard math Fano's inequality provides a valid lower bound on mutual information for the worst-case task variable in the registered family
Invoked to obtain the worst-Y Fano floor for any deterministic h.

pith-pipeline@v0.9.1-grok · 5915 in / 1286 out tokens · 14065 ms · 2026-06-28T09:41:20.720069+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 10 linked inside Pith

[1]

IEEE Information Theory Workshop (ITW) , year =

Tishby, Naftali and Zaslavsky, Noga , title =. IEEE Information Theory Workshop (ITW) , year =
[2]

and Bialek, William , title =

Tishby, Naftali and Pereira, Fernando C. and Bialek, William , title =. arXiv preprint physics/0004057 , year =

Pith/arXiv arXiv
[3]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

Achille, Alessandro and Soatto, Stefano , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2018 , note =

2018
[4]

Devon and Fedorov, Alex and Lavoie-Marchildon, Samuel and others , title =

Hjelm, R. Devon and Fedorov, Alex and Lavoie-Marchildon, Samuel and others , title =. International Conference on Learning Representations (ICLR) , year =
[5]

International Conference on Machine Learning (ICML) , year =

Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeswar, Sai and others , title =. International Conference on Machine Learning (ICML) , year =
[6]

Estimating Mutual Information , journal =

Kraskov, Alexander and St. Estimating Mutual Information , journal =
[7]

arXiv preprint arXiv:1807.03748 , year =

van den Oord, Aaron and Li, Yazhe and Vinyals, Oriol , title =. arXiv preprint arXiv:1807.03748 , year =

Pith/arXiv arXiv
[8]

and Thomas, Joy A

Cover, Thomas M. and Thomas, Joy A. , title =
[9]

arXiv preprint arXiv:2107.03374 , year =

Chen, Mark and Tworek, Jerry and Jun, Heewoo and others , title =. arXiv preprint arXiv:2107.03374 , year =

Pith/arXiv arXiv
[10]

Code Llama: Open Foundation Models for Code , journal =

Rozi. Code Llama: Open Foundation Models for Code , journal =
[11]

arXiv preprint arXiv:2409.12186 , year =

Hui, Binyuan and Yang, Jian and Cui, Zeyu and others , title =. arXiv preprint arXiv:2409.12186 , year =

Pith/arXiv arXiv
[12]

arXiv preprint arXiv:2401.14196 , year =

Guo, Daya and Zhu, Qihao and Yang, Dejian and others , title =. arXiv preprint arXiv:2401.14196 , year =

Pith/arXiv arXiv
[13]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[14]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Zhou, Shuyan and Alon, Uri and Agarwal, Sumit and Neubig, Graham , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2023
[15]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Wang, Shiqi and Li, Zheng and Qian, Haifeng and others , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =
[16]

arXiv preprint arXiv:2307.15043 , year =

Zou, Andy and Wang, Zifan and Carlini, Nicholas and others , title =. arXiv preprint arXiv:2307.15043 , year =

Pith/arXiv arXiv
[17]

2022 IEEE Symposium on Security and Privacy (SP) , year =

Pearce, Hammond and Ahmad, Baleegh and Tan, Benjamin and others , title =. 2022 IEEE Symposium on Security and Privacy (SP) , year =

2022
[18]

Siddiq, Mohammed Latif and Santos, Joanna C. S. , title =. Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S) , year =
[19]

Proceedings of the 48th International Conference on Software Engineering (ICSE) , year =

Liu, Shuhan and Hu, Xing and Huang, Kerui and Yang, Xiaohu and Lo, David and Xia, Xin , title =. Proceedings of the 48th International Conference on Software Engineering (ICSE) , year =
[20]

and Su, Zhendong and others , title =

Hindle, Abram and Barr, Earl T. and Su, Zhendong and others , title =. Proceedings of the 34th International Conference on Software Engineering (ICSE) , year =
[21]

Proceedings of the 42nd International Conference on Software Engineering (ICSE) , year =

Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and others , title =. Proceedings of the 42nd International Conference on Software Engineering (ICSE) , year =
[22]

and Devanbu, Premkumar and Sutton, Charles , title =

Allamanis, Miltiadis and Barr, Earl T. and Devanbu, Premkumar and Sutton, Charles , title =. ACM Computing Surveys (CSUR) , volume =. 2018 , note =

2018
[23]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Kaneko, Masahiro and Baldwin, Timothy , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[24]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Xu, Aolin and Raginsky, Maxim , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[25]

arXiv preprint arXiv:2108.07732 , year =

Austin, Jacob and Odena, Augustus and Nye, Maxwell and others , title =. arXiv preprint arXiv:2108.07732 , year =

Pith/arXiv arXiv
[26]

International Conference on Learning Representations (ICLR) , year =

Madry, Aleksander and Makelov, Aleksandar and Schmidt, Ludwig and Tsipras, Dimitris and Vladu, Adrian , title =. International Conference on Learning Representations (ICLR) , year =
[27]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Zhong, Li and Wang, Zilong , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2024 , doi =

2024
[28]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Tian, Yuchen and Yan, Weixiang and Yang, Qian and Zhao, Xuandong and Chen, Qian and Wang, Wen and Luo, Ziyang and Ma, Lei and Song, Dawn , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =

2025
[29]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Lin, Leon and Brown, Hannah and Kawaguchi, Kenji and Shieh, Michael , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =

2025
[30]

arXiv preprint arXiv:2406.19783 , year =

Chen, Junkai and Li, Zhenhao and Hu, Xing and Xia, Xin , title =. arXiv preprint arXiv:2406.19783 , year =

arXiv
[31]

Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages =

Li, Xueyang and Meng, Guozhu and Liu, Shangqing and Xiang, Lu and Sun, Kun and Chen, Kai and Luo, Xiapu and Liu, Yang , title =. Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages =. 2024 , doi =

2024
[32]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =

Zhang, Yuhao and Wang, Shiqi and Qian, Haifeng and Wang, Zijian and Shang, Mingyue and Liu, Linbo and Gouda, Sanjay Krishna and Ray, Baishakhi and Ramanathan, Murali Krishna and Ma, Xiaofei and Deoras, Anoop , title =. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =. 2024 , doi =

2024
[33]

International Conference on Learning Representations (ICLR) , pages =

Xu, Xilie and Kong, Keyi and Liu, Ning and Cui, Lizhen and Wang, Di and Zhang, Jingfeng and Kankanhalli, Mohan , title =. International Conference on Learning Representations (ICLR) , pages =
[34]

and Tram

Chao, Patrick and Debenedetti, Edoardo and Robey, Alexander and Andriushchenko, Maksym and Croce, Francesco and Sehwag, Vikash and Dobriban, Edgar and Flammarion, Nicolas and Pappas, George J. and Tram. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2024 , doi =

2024
[35]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Cao, Bowen and Cai, Deng and Zhang, Zhisong and Zou, Yuexian and Lam, Wai , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2024 , doi =

2024
[36]

, title =

He, Jingxuan and Vechev, Martin T. , title =. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS) , pages =. 2023 , doi =

2023
[37]

arXiv preprint arXiv:2602.18800 , year =

Paul, Debalina Ghosh and Zhu, Hong and Bayley, Ian , title =. arXiv preprint arXiv:2602.18800 , year =

arXiv
[38]

, title =

Paleyes, Andrei and Sendyka, Radzim and Robinson, Diana and Cabrera, Christian and Lawrence, Neil D. , title =. arXiv preprint arXiv:2506.10204 , year =

arXiv
[39]

arXiv preprint arXiv:2511.16209 , year =

Jawad, Huseein and Brunel, Nicolas , title =. arXiv preprint arXiv:2511.16209 , year =

arXiv
[40]

arXiv preprint arXiv:2509.21199 , year =

Gao, Lang and others , title =. arXiv preprint arXiv:2509.21199 , year =

Pith/arXiv arXiv
[41]

arXiv preprint arXiv:2604.23887 , year =

Deep, Priyal and Emmons, Shane and Fox, Amy and Bacon, Kyle and McAllister, Kelley and Ortiz, Peter and Flautner, Krisztian , title =. arXiv preprint arXiv:2604.23887 , year =

Pith/arXiv arXiv
[42]

arXiv preprint arXiv:2402.19173 , year =

Lozhkov, Anton and Li, Raymond and Allal, Loubna Ben and others , title =. arXiv preprint arXiv:2402.19173 , year =

Pith/arXiv arXiv
[43]

International Conference on Machine Learning (ICML) , year =

Poole, Ben and Ozair, Sherjil and van den Oord, Aaron and Alemi, Alexander and Tucker, George , title =. International Conference on Machine Learning (ICML) , year =
[44]

International Conference on Learning Representations (ICLR) , year =

Song, Jiaming and Ermon, Stefano , title =. International Conference on Learning Representations (ICLR) , year =
[45]

International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

McAllester, David and Stratos, Karl , title =. International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

[1] [1]

IEEE Information Theory Workshop (ITW) , year =

Tishby, Naftali and Zaslavsky, Noga , title =. IEEE Information Theory Workshop (ITW) , year =

[2] [2]

and Bialek, William , title =

Tishby, Naftali and Pereira, Fernando C. and Bialek, William , title =. arXiv preprint physics/0004057 , year =

Pith/arXiv arXiv

[3] [3]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

Achille, Alessandro and Soatto, Stefano , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2018 , note =

2018

[4] [4]

Devon and Fedorov, Alex and Lavoie-Marchildon, Samuel and others , title =

Hjelm, R. Devon and Fedorov, Alex and Lavoie-Marchildon, Samuel and others , title =. International Conference on Learning Representations (ICLR) , year =

[5] [5]

International Conference on Machine Learning (ICML) , year =

Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeswar, Sai and others , title =. International Conference on Machine Learning (ICML) , year =

[6] [6]

Estimating Mutual Information , journal =

Kraskov, Alexander and St. Estimating Mutual Information , journal =

[7] [7]

arXiv preprint arXiv:1807.03748 , year =

van den Oord, Aaron and Li, Yazhe and Vinyals, Oriol , title =. arXiv preprint arXiv:1807.03748 , year =

Pith/arXiv arXiv

[8] [8]

and Thomas, Joy A

Cover, Thomas M. and Thomas, Joy A. , title =

[9] [9]

arXiv preprint arXiv:2107.03374 , year =

Chen, Mark and Tworek, Jerry and Jun, Heewoo and others , title =. arXiv preprint arXiv:2107.03374 , year =

Pith/arXiv arXiv

[10] [10]

Code Llama: Open Foundation Models for Code , journal =

Rozi. Code Llama: Open Foundation Models for Code , journal =

[11] [11]

arXiv preprint arXiv:2409.12186 , year =

Hui, Binyuan and Yang, Jian and Cui, Zeyu and others , title =. arXiv preprint arXiv:2409.12186 , year =

Pith/arXiv arXiv

[12] [12]

arXiv preprint arXiv:2401.14196 , year =

Guo, Daya and Zhu, Qihao and Yang, Dejian and others , title =. arXiv preprint arXiv:2401.14196 , year =

Pith/arXiv arXiv

[13] [13]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[14] [14]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Zhou, Shuyan and Alon, Uri and Agarwal, Sumit and Neubig, Graham , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2023

[15] [15]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Wang, Shiqi and Li, Zheng and Qian, Haifeng and others , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =

[16] [16]

arXiv preprint arXiv:2307.15043 , year =

Zou, Andy and Wang, Zifan and Carlini, Nicholas and others , title =. arXiv preprint arXiv:2307.15043 , year =

Pith/arXiv arXiv

[17] [17]

2022 IEEE Symposium on Security and Privacy (SP) , year =

Pearce, Hammond and Ahmad, Baleegh and Tan, Benjamin and others , title =. 2022 IEEE Symposium on Security and Privacy (SP) , year =

2022

[18] [18]

Siddiq, Mohammed Latif and Santos, Joanna C. S. , title =. Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S) , year =

[19] [19]

Proceedings of the 48th International Conference on Software Engineering (ICSE) , year =

Liu, Shuhan and Hu, Xing and Huang, Kerui and Yang, Xiaohu and Lo, David and Xia, Xin , title =. Proceedings of the 48th International Conference on Software Engineering (ICSE) , year =

[20] [20]

and Su, Zhendong and others , title =

Hindle, Abram and Barr, Earl T. and Su, Zhendong and others , title =. Proceedings of the 34th International Conference on Software Engineering (ICSE) , year =

[21] [21]

Proceedings of the 42nd International Conference on Software Engineering (ICSE) , year =

Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and others , title =. Proceedings of the 42nd International Conference on Software Engineering (ICSE) , year =

[22] [22]

and Devanbu, Premkumar and Sutton, Charles , title =

Allamanis, Miltiadis and Barr, Earl T. and Devanbu, Premkumar and Sutton, Charles , title =. ACM Computing Surveys (CSUR) , volume =. 2018 , note =

2018

[23] [23]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Kaneko, Masahiro and Baldwin, Timothy , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[24] [24]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Xu, Aolin and Raginsky, Maxim , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[25] [25]

arXiv preprint arXiv:2108.07732 , year =

Austin, Jacob and Odena, Augustus and Nye, Maxwell and others , title =. arXiv preprint arXiv:2108.07732 , year =

Pith/arXiv arXiv

[26] [26]

International Conference on Learning Representations (ICLR) , year =

Madry, Aleksander and Makelov, Aleksandar and Schmidt, Ludwig and Tsipras, Dimitris and Vladu, Adrian , title =. International Conference on Learning Representations (ICLR) , year =

[27] [27]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Zhong, Li and Wang, Zilong , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2024 , doi =

2024

[28] [28]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Tian, Yuchen and Yan, Weixiang and Yang, Qian and Zhao, Xuandong and Chen, Qian and Wang, Wen and Luo, Ziyang and Ma, Lei and Song, Dawn , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =

2025

[29] [29]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Lin, Leon and Brown, Hannah and Kawaguchi, Kenji and Shieh, Michael , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =

2025

[30] [30]

arXiv preprint arXiv:2406.19783 , year =

Chen, Junkai and Li, Zhenhao and Hu, Xing and Xia, Xin , title =. arXiv preprint arXiv:2406.19783 , year =

arXiv

[31] [31]

Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages =

Li, Xueyang and Meng, Guozhu and Liu, Shangqing and Xiang, Lu and Sun, Kun and Chen, Kai and Luo, Xiapu and Liu, Yang , title =. Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages =. 2024 , doi =

2024

[32] [32]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =

Zhang, Yuhao and Wang, Shiqi and Qian, Haifeng and Wang, Zijian and Shang, Mingyue and Liu, Linbo and Gouda, Sanjay Krishna and Ray, Baishakhi and Ramanathan, Murali Krishna and Ma, Xiaofei and Deoras, Anoop , title =. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =. 2024 , doi =

2024

[33] [33]

International Conference on Learning Representations (ICLR) , pages =

Xu, Xilie and Kong, Keyi and Liu, Ning and Cui, Lizhen and Wang, Di and Zhang, Jingfeng and Kankanhalli, Mohan , title =. International Conference on Learning Representations (ICLR) , pages =

[34] [34]

and Tram

Chao, Patrick and Debenedetti, Edoardo and Robey, Alexander and Andriushchenko, Maksym and Croce, Francesco and Sehwag, Vikash and Dobriban, Edgar and Flammarion, Nicolas and Pappas, George J. and Tram. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2024 , doi =

2024

[35] [35]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Cao, Bowen and Cai, Deng and Zhang, Zhisong and Zou, Yuexian and Lam, Wai , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2024 , doi =

2024

[36] [36]

, title =

He, Jingxuan and Vechev, Martin T. , title =. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS) , pages =. 2023 , doi =

2023

[37] [37]

arXiv preprint arXiv:2602.18800 , year =

Paul, Debalina Ghosh and Zhu, Hong and Bayley, Ian , title =. arXiv preprint arXiv:2602.18800 , year =

arXiv

[38] [38]

, title =

Paleyes, Andrei and Sendyka, Radzim and Robinson, Diana and Cabrera, Christian and Lawrence, Neil D. , title =. arXiv preprint arXiv:2506.10204 , year =

arXiv

[39] [39]

arXiv preprint arXiv:2511.16209 , year =

Jawad, Huseein and Brunel, Nicolas , title =. arXiv preprint arXiv:2511.16209 , year =

arXiv

[40] [40]

arXiv preprint arXiv:2509.21199 , year =

Gao, Lang and others , title =. arXiv preprint arXiv:2509.21199 , year =

Pith/arXiv arXiv

[41] [41]

arXiv preprint arXiv:2604.23887 , year =

Deep, Priyal and Emmons, Shane and Fox, Amy and Bacon, Kyle and McAllister, Kelley and Ortiz, Peter and Flautner, Krisztian , title =. arXiv preprint arXiv:2604.23887 , year =

Pith/arXiv arXiv

[42] [42]

arXiv preprint arXiv:2402.19173 , year =

Lozhkov, Anton and Li, Raymond and Allal, Loubna Ben and others , title =. arXiv preprint arXiv:2402.19173 , year =

Pith/arXiv arXiv

[43] [43]

International Conference on Machine Learning (ICML) , year =

Poole, Ben and Ozair, Sherjil and van den Oord, Aaron and Alemi, Alexander and Tucker, George , title =. International Conference on Machine Learning (ICML) , year =

[44] [44]

International Conference on Learning Representations (ICLR) , year =

Song, Jiaming and Ermon, Stefano , title =. International Conference on Learning Representations (ICLR) , year =

[45] [45]

International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

McAllester, David and Stratos, Karl , title =. International Conference on Artificial Intelligence and Statistics (AISTATS) , year =