Title resolution pending

Attention Is All You Need , author= · 2023

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

KVM is a novel block-recurrent compressed memory for attention that unifies expandable transformer context with linear RNN efficiency, enabling competitive long-context performance with released code and models.

GravityGraphSAGE: Link Prediction in Directed Attributed Graphs

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

GravityGraphSAGE adapts GraphSAGE with a gravity-inspired decoder to outperform prior graph deep learning methods on directed link prediction across citation networks and 16 real-world graphs.

From Mechanistic to Compositional Interpretability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.

Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding

cs.AI · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

LC-MAPF uses multi-round local communication between neighboring agents in a pre-trained model to outperform prior learning-based MAPF solvers on diverse unseen scenarios while preserving scalability.

Spherical Flows for Sampling Categorical Data

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

Spherical vMF flows reduce the continuity equation on the sphere to a scalar ODE in cosine similarity, enabling posterior-weighted sampling of categorical sequences via cross-entropy trained posteriors.

Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs

cs.SI · 2026-05-10 · unverdicted · novelty 6.0

LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.

Metaphor Is Not All Attention Needs

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.

Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP

cs.DC · 2026-05-08 · unverdicted · novelty 5.0

FCP shards sequences at block level with flexible P2P communication and bin-packing to achieve near-linear scaling up to 256 GPUs and 1.13x-2.21x higher attention MFU in foundation model pre-training.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

citing papers explorer

Showing 10 of 10 citing papers.

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory cs.LG · 2026-05-11 · unverdicted · none · ref 3
KVM is a novel block-recurrent compressed memory for attention that unifies expandable transformer context with linear RNN efficiency, enabling competitive long-context performance with released code and models.
GravityGraphSAGE: Link Prediction in Directed Attributed Graphs cs.LG · 2026-05-10 · unverdicted · none · ref 31
GravityGraphSAGE adapts GraphSAGE with a gravity-inspired decoder to outperform prior graph deep learning methods on directed link prediction across citation networks and 16 real-world graphs.
From Mechanistic to Compositional Interpretability cs.LG · 2026-05-09 · unverdicted · none · ref 93
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding cs.AI · 2026-05-08 · unverdicted · none · ref 2 · 2 links
LC-MAPF uses multi-round local communication between neighboring agents in a pre-trained model to outperform prior learning-based MAPF solvers on diverse unseen scenarios while preserving scalability.
Spherical Flows for Sampling Categorical Data stat.ML · 2026-05-07 · unverdicted · none · ref 38
Spherical vMF flows reduce the continuity equation on the sphere to a scalar ODE in cosine similarity, enabling posterior-weighted sampling of categorical sequences via cross-entropy trained posteriors.
Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs cs.SI · 2026-05-10 · unverdicted · none · ref 45
LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.
Structured Recurrent Mixers for Massively Parallelized Sequence Generation cs.CL · 2026-05-09 · unverdicted · none · ref 40
Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.
Metaphor Is Not All Attention Needs cs.CL · 2026-05-12 · unverdicted · none · ref 11
Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.
Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP cs.DC · 2026-05-08 · unverdicted · none · ref 9
FCP shards sequences at block level with flexible P2P communication and bin-packing to achieve near-linear scaling up to 256 GPUs and 1.13x-2.21x higher attention MFU in foundation model pre-training.
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility cs.LG · 2026-05-07 · unverdicted · none · ref 2 · 2 links
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer