Pay attention to attention distribution: A new lo- cal lipschitz bound for transformers

Yudin, N · 2025 · arXiv 2507.07814

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Mechanistic Analysis of Looped Reasoning Language Models

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.

Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion

cs.LG · 2026-04-19 · unverdicted · novelty 4.0

A data-parameter correspondence unifies data-centric and parameter-centric LLM optimizations as dual geometric operations on the statistical manifold via Fisher-Rao metric and Legendre duality.

citing papers explorer

Showing 2 of 2 citing papers.

A Mechanistic Analysis of Looped Reasoning Language Models cs.LG · 2026-04-13 · unverdicted · none · ref 33
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion cs.LG · 2026-04-19 · unverdicted · none · ref 63
A data-parameter correspondence unifies data-centric and parameter-centric LLM optimizations as dual geometric operations on the statistical manifold via Fisher-Rao metric and Legendre duality.

Pay attention to attention distribution: A new lo- cal lipschitz bound for transformers

fields

years

verdicts

representative citing papers

citing papers explorer