Jacobian Scopes: token-level causal attributions in LLMs

Baran Zadeo\u{g}lu; Christopher J. Earls; Gurbir Arora; Nicolas Boull\'e; Rapha\"el Sarfati; Toni J.B. Liu

arxiv: 2601.16407 · v4 · pith:ETTWREGYnew · submitted 2026-01-23 · 💻 cs.CL · cs.AI

Jacobian Scopes: token-level causal attributions in LLMs

Toni J.B. Liu , Baran Zadeo\u{g}lu , Nicolas Boull\'e , Rapha\"el Sarfati , Gurbir Arora , Christopher J. Earls This is my paper

classification 💻 cs.CL cs.AI

keywords jacobianscopesin-contextcausalinfluencellmsmodelprediction

0 comments

read the original abstract

Large language models (LLMs) make next-token predictions based on clues present in their context, such as semantic descriptions and in-context examples. Yet, elucidating which prior tokens most strongly influence a given prediction remains challenging due to the proliferation of layers and attention heads in modern architectures. We propose Jacobian Scopes, a suite of gradient-based, token-level causal attribution methods for interpreting LLM predictions. Grounded in perturbation theory and information geometry, Jacobian Scopes quantify how input tokens influence various aspects of a model's prediction, such as specific logits, the full predictive distribution, and model uncertainty (effective temperature). Through case studies spanning instruction understanding, translation, and in-context learning (ICL), we demonstrate how Jacobian Scopes reveal implicit political biases, uncover word- and phrase-level translation strategies, and shed light on recently debated mechanisms underlying in-context time-series forecasting. To facilitate exploration of Jacobian Scopes on custom text, we open-source our implementations and provide a cloud-hosted interactive demo at https://huggingface.co/spaces/Typony/JacobianScopes.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Attribution Contract: Feature Attribution for Generative Language Models
cs.LG 2026-05 unverdicted novelty 7.0

Introduces the Attribution Contract specification to clarify feature attribution claims in generative language models by naming the output explained, eligible features, generative process, fixed elements, and attribut...
GIF: Locally Sound Geometric Information Flow Control for LLMs
cs.AI 2026-06 unverdicted novelty 6.0

GIF introduces a Jacobian-based upper bound on input-output mutual information in LLMs with formal Lean proof and strong empirical recall on injection and leakage benchmarks.