GIF: Locally Sound Geometric Information Flow Control for LLMs
Pith reviewed 2026-06-26 08:22 UTC · model grok-4.3
The pith
GIF uses the LLM Jacobian and local output geometry to upper-bound Shannon mutual information between perturbed input spans and model outputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GIF is a semantic framework that uses the LLM Jacobian and local output geometry to upper-bound the Shannon mutual information between perturbed input spans and model outputs, yielding a scalable measure that satisfies local geometric soundness and is supported by a fully mechanized Lean 4 proof under local regularity assumptions.
What carries the argument
The GIF upper bound on mutual information, computed from the LLM Jacobian via automatic differentiation and low-rank approximation.
If this is right
- GIF achieves near-perfect recall on integrity and confidentiality benchmarks without a downstream declassifier.
- GIF outperforms attention-based baselines and matches or exceeds F1 of direct LLM-as-judge methods at up to 81x lower token cost when paired with lightweight declassifiers.
- Information flows detected using small surrogate models transfer to state-of-the-art models up to 200x larger and across model families.
Where Pith is reading between the lines
- The bound could support runtime enforcement of information flow policies inside agentic LLM systems without requiring full model gradients at inference time.
- Transferability from surrogates suggests a path to black-box deployment where only query access is available.
- The geometric approach might extend to measuring flow in other sequence models whose Jacobians can be approximated.
Load-bearing premise
The local regularity assumptions under which the GIF bound is proven to upper-limit true information flow.
What would settle it
An input-output example on a model satisfying the local regularity assumptions where the actual mutual information between an input span and output exceeds the computed GIF value.
Figures
read the original abstract
Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating security and privacy risks. These range from prompt injections that manipulate downstream tool use to leakage of confidential information through model outputs. Recent Information Flow Control (IFC)-based defenses show promise but lack a principled semantic foundation for reasoning about information flow through the model itself. Since any input token may influence any output token in an autoregressive LLM, existing approaches suffer from severe taint explosion. We present Geometric Information Flow (GIF), a semantic framework for tracking information flow from input tokens to outputs. GIF uses the LLM Jacobian and local output geometry to upper-bound the Shannon mutual information between perturbed input spans and model outputs, yielding a scalable measure computable on large models via automatic differentiation and low-rank approximation. Unlike attention-based or correlational attribution heuristics, GIF satisfies local geometric soundness, and we provide a fully mechanized Lean 4 proof that it upper-bounds the true information flow induced by a given prompt under local regularity assumptions. We evaluate GIF on integrity and confidentiality tasks across multiple prompt-injection and privacy-leakage benchmarks. GIF achieves near-perfect recall even without a downstream declassifier, outperforming attention-based baselines. Combined with lightweight LLM-based declassifiers, it matches or exceeds the F1 of direct LLM-as-judge baselines such as GPT-5.5 xhigh reasoning while using up to 81x lower token cost. GIF flows detected with small surrogate models transfer to larger state-of-the-art models and other model families, even when the surrogate is up to 200x smaller, suggesting black-box deployment without gradient access.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Geometric Information Flow (GIF), a semantic framework that uses the LLM Jacobian and local output geometry to upper-bound Shannon mutual information between perturbed input spans and model outputs. It claims a fully mechanized Lean 4 proof establishing this bound under local regularity assumptions, and reports empirical results on prompt-injection and privacy-leakage benchmarks showing near-perfect recall, outperformance of attention baselines, transfer from small surrogates to large models, and efficiency gains when combined with lightweight declassifiers.
Significance. If the bound and transfer hold, GIF supplies a scalable, geometrically grounded alternative to heuristic attribution methods for information-flow control in LLMs, directly addressing taint explosion. The provision of a mechanized Lean 4 proof is a clear strength that supplies independent verification of the local soundness claim under the stated assumptions.
major comments (2)
- [Proof section / abstract paragraph on mechanized proof] The Lean 4 proof (abstract and proof section) establishes the upper bound only under local regularity assumptions on the output geometry and Jacobian. The manuscript provides no diagnostic, check, or verification that these assumptions (Lipschitz constants, differentiability, curvature bounds) hold for the evaluated models, perturbation sizes, or input regimes in the prompt-injection and privacy benchmarks. This is load-bearing for the claim that GIF upper-bounds true information flow in the reported experiments.
- [Evaluation sections on benchmarks and transfer] Experimental evaluation sections: the transfer results (surrogate-to-large-model, up to 200x size difference) and benchmark F1/recall claims rest on the assumption that the local geometric bound remains valid across model scales and families, yet no evidence is given that the regularity conditions are preserved or that the low-rank Jacobian approximation does not violate them.
minor comments (1)
- [Abstract] Notation for the Jacobian and local geometry quantities is introduced without an explicit equation reference in the abstract; a forward pointer to the defining equation would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and for identifying key points regarding the connection between our theoretical results and empirical evaluations. We address each major comment below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Proof section / abstract paragraph on mechanized proof] The Lean 4 proof (abstract and proof section) establishes the upper bound only under local regularity assumptions on the output geometry and Jacobian. The manuscript provides no diagnostic, check, or verification that these assumptions (Lipschitz constants, differentiability, curvature bounds) hold for the evaluated models, perturbation sizes, or input regimes in the prompt-injection and privacy benchmarks. This is load-bearing for the claim that GIF upper-bounds true information flow in the reported experiments.
Authors: We agree that the mechanized Lean 4 proof is conditional on local regularity assumptions and that the manuscript does not include explicit empirical diagnostics verifying these assumptions for the specific models, perturbation sizes, and input regimes in the benchmarks. This is a valid observation. In the revised version, we will add a new subsection (tentatively in Section 5) that provides supporting evidence for the assumptions. This will include: (i) empirical estimates of local Lipschitz constants computed via finite differences on representative benchmark inputs for the evaluated models; (ii) discussion of the practical impact of the low-rank Jacobian approximation; and (iii) clarification that the assumptions are intended to hold locally for the small perturbations used in GIF. While exhaustive verification of curvature bounds across all inputs is computationally prohibitive at LLM scale, these additions will make the link between theory and experiments more explicit. revision: yes
-
Referee: [Evaluation sections on benchmarks and transfer] Experimental evaluation sections: the transfer results (surrogate-to-large-model, up to 200x size difference) and benchmark F1/recall claims rest on the assumption that the local geometric bound remains valid across model scales and families, yet no evidence is given that the regularity conditions are preserved or that the low-rank Jacobian approximation does not violate them.
Authors: The transfer results are presented as empirical evidence that GIF flows computed on small surrogate models can be applied to larger models. We acknowledge, however, that the manuscript does not explicitly demonstrate preservation of the regularity conditions or the validity of the low-rank approximation across scales and families. In the revision, we will expand the transfer discussion (in Section 6) with additional analysis comparing Jacobian rank, spectral norms, and local geometry statistics between the surrogate models and the target models (including the 200x size difference cases). This will provide evidence that the low-rank approximation does not materially violate the bound under the perturbation regimes tested. We will also clarify that the local character of the bound makes scale-invariance more plausible, while noting that the empirical performance stands independently as a practical result. revision: yes
Circularity Check
No circularity: derivation relies on mechanized Lean 4 proof under stated assumptions, independent of fitted parameters or self-referential definitions
full rationale
The paper's central claim is that the GIF quantity (Jacobian-based local geometry) upper-bounds true Shannon mutual information, with the bound established by a fully mechanized Lean 4 proof under local regularity assumptions. No equations or steps in the provided text reduce the bound to a fitted parameter, self-definition, or self-citation chain. The mechanized proof counts as independent support per the evaluation rules. The unverified status of the regularity assumptions on evaluated models is a correctness/verification gap, not a circularity reduction. No load-bearing self-citations, ansatzes smuggled via citation, or renaming of known results are present. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Local regularity assumptions under which GIF upper-bounds true information flow
Reference graph
Works this paper leans on
-
[1]
Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,” in Proceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90
2023
-
[2]
LLM01:2025 prompt injection, OW ASP top 10 for LLM applications,
OW ASP GenAI Security Project, “LLM01:2025 prompt injection, OW ASP top 10 for LLM applications,” https://genai.owasp.org/ llmrisk/llm01-prompt-injection/, 2025, accessed: 2026-06-11
2025
-
[3]
OW ASP top 10 for LLM applications 2025,
——, “OW ASP top 10 for LLM applications 2025,” https://genai. owasp.org/resource/owasp-top-10-for-llm-applications-2025/, 2025, accessed: 2026-06-11
2025
-
[4]
Simple prompt injection attacks can leak personal data observed by llm agents during task execution,
M. Alizadeh, Z. Samei, D. Stetsenko, and F. Gilardi, “Simple prompt injection attacks can leak personal data observed by llm agents during task execution,” 2025. [Online]. Available: https://arxiv.org/abs/2506.01055
arXiv 2025
-
[5]
A lattice model of secure information flow,
D. E. Denning, “A lattice model of secure information flow,”Com- munications of the ACM, vol. 19, no. 5, pp. 236–243, 1976
1976
-
[6]
Language-based information-flow security,
A. Sabelfeld and A. C. Myers, “Language-based information-flow security,”IEEE Journal on selected areas in communications, vol. 21, no. 1, pp. 5–19, 2003
2003
-
[7]
A decentralized model for information flow control,
A. C. Myers and B. Liskov, “A decentralized model for information flow control,”ACM SIGOPS Operating Systems Review, vol. 31, no. 5, pp. 129–142, 1997
1997
-
[8]
A sound type system for secure flow analysis,
D. V olpano, C. Irvine, and G. Smith, “A sound type system for secure flow analysis,”Journal of computer security, vol. 4, no. 2-3, pp. 167– 187, 1996
1996
-
[9]
Security policies and security mod- els,
J. A. Goguen and J. Meseguer, “Security policies and security mod- els,” in1982 IEEE symposium on security and privacy. IEEE, 1982, pp. 11–11
1982
-
[10]
F. Wu, E. Cecchetti, and C. Xiao, “System-level defense against indirect prompt injection attacks: An information flow control per- spective,”CoRR, vol. abs/2409.19091, 2024
arXiv 2024
-
[11]
Securing ai agents with information-flow control,
M. Costa, B. K ¨opf, A. Kolluri, A. Paverd, M. Russinovich, A. Salem, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Securing ai agents with information-flow control,”CoRR, vol. abs/2505.23643, 2025
Pith/arXiv arXiv 2025
-
[12]
Defeating Prompt Injections by Design,
E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating Prompt Injections by Design,” Jun. 2025, arXiv:2503.18813. [Online]. Available: http://arxiv.org/abs/2503.18813
Pith/arXiv arXiv 2025
-
[13]
Design Patterns for Securing LLM Agents against Prompt Injections,
L. Beurer-Kellner, B. B. A.-M. Cret ¸u, E. Debenedetti, D. Dobos, D. Fabian, M. Fischer, D. Froelicher, K. Grosse, D. Naeff, E. Ozoani, A. Paverd, F. Tram `er, and V . V olhejn, “Design Patterns for Securing LLM Agents against Prompt Injections,” Jun. 2025, arXiv:2506.08837 version: 1. [Online]. Available: http://arxiv.org/abs/2506.08837
arXiv 2025
-
[14]
Rtbas: Defending llm agents against prompt injection and privacy leakage,
P. Y . Zhong, S. Chen, R. Wang, M. McCall, B. L. Titzer, H. Miller, and P. B. Gibbons, “Rtbas: Defending llm agents against prompt injection and privacy leakage,” 2025. [Online]. Available: https://arxiv.org/abs/2502.08966
arXiv 2025
-
[15]
Defending against indirect prompt injection attacks with spotlighting,
K. Hines, G. Lopez, M. Hall, F. Zarfati, Y . Zunger, and E. Kiciman, “Defending against indirect prompt injection attacks with spotlighting,” 2024. [Online]. Available: https://arxiv.org/abs/ 2403.14720
Pith/arXiv arXiv 2024
-
[16]
Promptarmor: Simple yet effective prompt injection defenses,
T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchi, B. Alomair, X. Zhao, W. Y . Wang, N. Gong, W. Guo, and D. Song, “Promptarmor: Simple yet effective prompt injection defenses,” 2025. [Online]. Available: https://arxiv.org/abs/2507.15219
arXiv 2025
-
[17]
Progent: Securing ai agents with privilege control,
T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, and D. Song, “Progent: Securing ai agents with privilege control,” 2026. [Online]. Available: https://arxiv.org/abs/2504.11703
Pith/arXiv arXiv 2026
-
[18]
Judging llm-as-a-judge with mt-bench and chatbot arena,
L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,”
-
[19]
Available: https://arxiv.org/abs/2306.05685
[Online]. Available: https://arxiv.org/abs/2306.05685
-
[20]
Systems security foun- dations for agentic computing,
M. Christodorescu, E. Fernandes, A. Hooda, S. Jha, J. Rehberger, K. Chaudhuri, X. Fu, K. Shams, G. Amir, J. Choi, S. Choudhary, N. Palumbo, A. Labunets, and N. V . Pandya, “Systems security foun- dations for agentic computing,” IEEE Secure Generative AI (SAGAI) Agents Workshop, Workshop report, 2025
2025
-
[21]
Automatic discovery and quantification of information leaks,
M. Backes, B. K ¨opf, and A. Rybalchenko, “Automatic discovery and quantification of information leaks,” in2009 30th IEEE Symposium on Security and Privacy. IEEE, 2009, pp. 141–153
2009
-
[22]
A statistical test for information leaks using continuous mutual information,
T. Chothia and A. Guha, “A statistical test for information leaks using continuous mutual information,” in2011 IEEE 24th Computer Security Foundations Symposium. IEEE, 2011, pp. 177–190
2011
-
[23]
Statistical mea- surement of information leakage,
K. Chatzikokolakis, T. Chothia, and A. Guha, “Statistical mea- surement of information leakage,” inInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2010, pp. 390–404
2010
-
[24]
On the foundations of quantitative information flow,
G. Smith, “On the foundations of quantitative information flow,” in International Conference on Foundations of Software Science and Computational Structures. Springer, 2009, pp. 288–302
2009
-
[25]
A static analysis for quanti- fying information flow in a simple imperative language,
D. Clark, S. Hunt, and P. Malacaria, “A static analysis for quanti- fying information flow in a simple imperative language,”Journal of Computer Security, vol. 15, no. 3, pp. 321–371, 2007
2007
-
[26]
Locally sound geometric information flow control for llms,
“Locally sound geometric information flow control for llms,” https: //geomifc.github.io/, 2026, accessed: 2026-06-04
2026
-
[27]
Amari and H
S.-i. Amari and H. Nagaoka,Methods of information geometry. American Mathematical Soc., 2000, vol. 191
2000
-
[28]
Understanding black-box predictions via influence functions,
P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” inProceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, D. Precup and Y . W. Teh, Eds., vol. 70. PMLR, 06–11 Aug 2017, pp. 1885–1894. [Online]. Available: https://proceedings.mlr.press/v70/koh17a.html
2017
-
[29]
Deep inside convolutional networks: Visualising image classification models and saliency maps,
K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” 2014. [Online]. Available: https://arxiv.org/abs/1312. 6034
2014
-
[30]
” why should i trust you?
M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” inProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144
2016
-
[31]
New insights and perspectives on the natural gradient method,
J. Martens, “New insights and perspectives on the natural gradient method,”Journal of Machine Learning Research, vol. 21, no. 146, pp. 1–76, 2020
2020
-
[32]
Explaining and harness- ing adversarial examples,
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,”arXiv preprint arXiv:1412.6572, 2014
Pith/arXiv arXiv 2014
-
[33]
T. M. Cover,Elements of information theory. John Wiley & Sons, 1999
1999
-
[34]
Automatic differentiation in machine learning: a survey,
A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: a survey,”Journal of machine learning research, vol. 18, no. 153, pp. 1–43, 2018
2018
-
[35]
GPT-OSS-120B configuration,
OpenAI, “GPT-OSS-120B configuration,” https://huggingface.co/ openai/gpt-oss-120b/blob/main/config.json, 2025, accessed 2026-06- 07
2025
-
[36]
Gemma 4 31B configuration,
Google DeepMind, “Gemma 4 31B configuration,” https: //huggingface.co/google/gemma-4-31B-it/blob/main/config.json, 2026, accessed 2026-06-07
2026
-
[37]
DeepSeek-V4-Pro configuration,
DeepSeek, “DeepSeek-V4-Pro configuration,” https://huggingface.co/ deepseek-ai/DeepSeek-V4-Pro/blob/main/config.json, 2026, accessed 2026-06-07
2026
-
[38]
A stochastic estimator of the trace of the in- fluence matrix for laplacian smoothing splines,
M. F. Hutchinson, “A stochastic estimator of the trace of the in- fluence matrix for laplacian smoothing splines,”Communications in Statistics-Simulation and Computation, vol. 18, no. 3, pp. 1059–1076, 1989
1989
-
[39]
Hutch++: Optimal stochastic trace estimation,
R. A. Meyer, C. Musco, C. Musco, and D. P. Woodruff, “Hutch++: Optimal stochastic trace estimation,” inSymposium on Simplicity in Algorithms (SOSA). SIAM, 2021, pp. 142–155
2021
-
[40]
Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram `er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” inThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. [Online]. Available: https://openreview.net/forum?id=m1YY AQjO3w
2024
-
[41]
MCP security bench (MSB): Benchmarking attacks against model context protocol in LLM agents,
D. Zhang, Z. Li, X. Luo, X. Liu, P. P. Li, and W. Xu, “MCP security bench (MSB): Benchmarking attacks against model context protocol in LLM agents,” inThe Fourteenth International Conference on Learning Representations, 2026. [Online]. Available: https://openreview.net/forum?id=irxxkFMrry
2026
-
[42]
AgentDAM: Privacy leakage evaluation for autonomous web agents,
A. Zharmagambetov, C. Guo, I. Evtimov, M. Pavlova, R. Salakhutdinov, and K. Chaudhuri, “AgentDAM: Privacy leakage evaluation for autonomous web agents,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2026. [Online]. Available: https://openreview.net/forum?id=qaxf7q41aK
2026
-
[43]
Webarena: A realistic web environment for building autonomous agents,
S. Zhou, F. F. Xu, H. Zhu, X. Zhou, R. Lo, A. Sridhar, X. Cheng, T. Ou, Y . Bisk, D. Fried, U. Alon, and G. Neubig, “Webarena: A realistic web environment for building autonomous agents,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=oKn9c6ytLx
2024
-
[44]
VisualWebArena: Evaluating multimodal agents on realistic visual web tasks,
J. Y . Koh, R. Lo, L. Jang, V . Duvvur, M. Lim, P.-Y . Huang, G. Neubig, S. Zhou, R. Salakhutdinov, and D. Fried, “VisualWebArena: Evaluating multimodal agents on realistic visual web tasks,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangk...
2024
-
[45]
Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems,
F. El Yagoubi, G. Badu-Marfo, and R. Al Mallah, “Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems,” arXiv preprint arXiv:2602.11510, 2026, submitted to arXiv on 12 Feb 2026. [Online]. Available: https://arxiv.org/abs/2602.11510
arXiv 2026
-
[46]
Jflow: practical mostly-static information flow control,
A. C. Myers, “Jflow: practical mostly-static information flow control,” inProceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’99. New York, NY , USA: Association for Computing Machinery, 1999, p. 228–241. [Online]. Available: https://doi.org/10.1145/292540.292561
-
[47]
Declassification: Dimensions and principles,
A. Sabelfeld and D. Sands, “Declassification: Dimensions and principles,”Journal of Computer Security, vol. 17, no. 5, pp. 517–548, Oct. 2009. [Online]. Available: https://journals.sagepub. com/doi/full/10.3233/JCS-2009-0352
-
[48]
Hyperproperties,
M. R. Clarkson and F. B. Schneider, “Hyperproperties,”J. Comput. Secur., vol. 18, no. 6, p. 1157–1210, Sep. 2010
2010
-
[49]
P. Cousot and R. Cousot, “Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints,” inProceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, ser. POPL ’77. New York, NY , USA: Association for Computing Machinery, 1977, p. 238–252. [Online]. Availabl...
-
[50]
Axiomatic attribution for deep networks,
M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inProceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 3319–3328
2017
-
[51]
Attention is not Explanation,
S. Jain and B. C. Wallace, “Attention is not Explanation,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2...
2019
-
[52]
Interpreting predictions of NLP models,
E. Wallace, M. Gardner, and S. Singh, “Interpreting predictions of NLP models,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, 2020, pp. 20–23
2020
-
[53]
Jacobian scopes: token-level causal attributions in llms,
T. J. B. Liu, B. Zadeo ˘glu, N. Boull ´e, R. Sarfati, and C. J. Earls, “Jacobian scopes: token-level causal attributions in llms,” 2026. [Online]. Available: https://arxiv.org/abs/2601.16407
Pith/arXiv arXiv 2026
-
[54]
Mechanistic data attribution: Tracing the training origins of interpretable llm units,
J. Chen, Y . Luo, and L. Pan, “Mechanistic data attribution: Tracing the training origins of interpretable llm units,” 2026. [Online]. Available: https://arxiv.org/abs/2601.21996
Pith/arXiv arXiv 2026
-
[55]
Date-lm: Benchmarking data attribution evaluation for large language models,
C. Jiao, Y . Pan, E. Xiao, D. Sheng, N. Jain, H. Zhao, I. Dasgupta, J. W. Ma, and C. Xiong, “Date-lm: Benchmarking data attribution evaluation for large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2507.09424
arXiv 2025
-
[56]
Adaptive attacks break defenses against indirect prompt injection attacks on llm agents,
Q. Zhan, R. Fang, H. S. Panchal, and D. Kang, “Adaptive attacks break defenses against indirect prompt injection attacks on llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2503.00061
arXiv 2025
-
[57]
Mitigating indirect prompt injection via instruction-following intent analysis,
M. Kang, C. Xiang, S. Kariyappa, C. Xiao, B. Li, and E. Suh, “Mitigating indirect prompt injection via instruction-following intent analysis,” 2025. [Online]. Available: https://arxiv.org/abs/2512.00966
arXiv 2025
-
[58]
T. Zhang, Y . Xu, J. Wang, K. Guo, X. Xu, B. Xiao, Q. Guan, J. Fan, J. Liu, Z. Liu, and H. Hu, “Agentsentry: Mitigating indirect prompt injection in llm agents via temporal causal diagnostics and context purification,” 2026. [Online]. Available: https://arxiv.org/abs/2602.22724
arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.