Formal algorithms for transformers

Formal algorithms for transformers , author= · 2022 · arXiv 2207.09238

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

cs.IT · 2026-05-24 · unverdicted · novelty 7.0

Under a polynomial context-truncation sensitivity assumption, suffix-only KV cache policies require per-token memory scaling as Θ(ε^{-1/α}) to achieve distortion ε.

Transformer-like Inference from Optimal Control

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.

Weaves, Wires, and Morphisms: Formalizing and Implementing the Algebra of Deep Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

A categorical algebra for deep learning that formalizes broadcasting via axis-stride and array-broadcasted categories and supplies matching Python and TypeScript implementations.

Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Temporal diversity in task distribution during training increases generalization bias over memorization in transformers for in-context linear regression.

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs

cs.LG · 2026-06-18 · unverdicted · novelty 5.0

AIR augments activation-aware SVD compression of LLMs with an influence metric and a closed-form ALS update, claiming >18% perplexity improvement at 60% parameter retention and 90% less calibration data than SVD-LLM(W).

Monetary Policy in the Media Spotlight: Sentiments, Signals, and Economic Impact

econ.EM · 2026-05-14 · unverdicted · novelty 5.0

Media sentiment indicators from Canadian news, when added to a New Keynesian model with endogenous central-bank response, improve out-of-sample forecasts and account for part of monetary-policy propagation to output and prices.

Attention-based graph neural networks: a survey

cs.SI · 2026-05-09 · unverdicted · novelty 5.0

The survey groups attention-based GNNs into three stages—graph recurrent attention networks, graph attention networks, and graph transformers—while reviewing architectures and future directions.

Lecture Notes on Statistical Physics and Neural Networks

cond-mat.dis-nn · 2026-05-07 · unverdicted · novelty 2.0

Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.

citing papers explorer

Showing 9 of 9 citing papers.

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression cs.IT · 2026-05-24 · unverdicted · none · ref 42
Under a polynomial context-truncation sensitivity assumption, suffix-only KV cache policies require per-token memory scaling as Θ(ε^{-1/α}) to achieve distortion ε.
Transformer-like Inference from Optimal Control cs.LG · 2026-05-15 · unverdicted · none · ref 1
Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.
Weaves, Wires, and Morphisms: Formalizing and Implementing the Algebra of Deep Learning cs.LG · 2026-04-08 · unverdicted · none · ref 6
A categorical algebra for deep learning that formalizes broadcasting via axis-stride and array-broadcasted categories and supplies matching Python and TypeScript implementations.
Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling cs.LG · 2026-05-18 · unverdicted · none · ref 35
Temporal diversity in task distribution during training increases generalization bias over memorization in transformers for in-context linear regression.
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 266
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs cs.LG · 2026-06-18 · unverdicted · none · ref 90
AIR augments activation-aware SVD compression of LLMs with an influence metric and a closed-form ALS update, claiming >18% perplexity improvement at 60% parameter retention and 90% less calibration data than SVD-LLM(W).
Monetary Policy in the Media Spotlight: Sentiments, Signals, and Economic Impact econ.EM · 2026-05-14 · unverdicted · none · ref 96
Media sentiment indicators from Canadian news, when added to a New Keynesian model with endogenous central-bank response, improve out-of-sample forecasts and account for part of monetary-policy propagation to output and prices.
Attention-based graph neural networks: a survey cs.SI · 2026-05-09 · unverdicted · none · ref 178
The survey groups attention-based GNNs into three stages—graph recurrent attention networks, graph attention networks, and graph transformers—while reviewing architectures and future directions.
Lecture Notes on Statistical Physics and Neural Networks cond-mat.dis-nn · 2026-05-07 · unverdicted · none · ref 43
Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.

Formal algorithms for transformers

fields

years

verdicts

representative citing papers

citing papers explorer