Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N · 2017

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learning to Focus Synthetic Aperture Radar On-line with State-Space Models

eess.IV · 2026-05-11 · unverdicted · novelty 7.0

An online SAR focusing framework using state-space models processes raw data line-by-line with 70x lower latency and 130x lower memory than block-based DSP while supporting downstream tasks.

On the Architectural Complexity of Neural Networks

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

A framework quantifies DNN complexity via tensor operations, links 40 years of breakthroughs to complexity increases, and releases a dataset of 3000+ unexplored high-complexity architectures.

Gaussian Relational Graph Transformer

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

GelGT proposes collaborative sampling and Gaussian attention on subgraphs to model long-range structural, semantic, and temporal dependencies in relational graphs, reporting up to 13.8% gains on downstream tasks.

The Scaling Properties of Implicit Deductive Reasoning in Transformers

cs.AI · 2026-05-05 · unverdicted · novelty 5.0

In deep Transformers using bidirectional prefix masks, implicit reasoning on Horn clauses matches explicit CoT performance across topologies and widths, but CoT is still required for depth extrapolation.

citing papers explorer

Showing 4 of 4 citing papers.

Learning to Focus Synthetic Aperture Radar On-line with State-Space Models eess.IV · 2026-05-11 · unverdicted · none · ref 21
An online SAR focusing framework using state-space models processes raw data line-by-line with 70x lower latency and 130x lower memory than block-based DSP while supporting downstream tasks.
On the Architectural Complexity of Neural Networks cs.LG · 2026-05-05 · unverdicted · none · ref 45
A framework quantifies DNN complexity via tensor operations, links 40 years of breakthroughs to complexity increases, and releases a dataset of 3000+ unexplored high-complexity architectures.
Gaussian Relational Graph Transformer cs.LG · 2026-05-15 · unverdicted · none · ref 28
GelGT proposes collaborative sampling and Gaussian attention on subgraphs to model long-range structural, semantic, and temporal dependencies in relational graphs, reporting up to 13.8% gains on downstream tasks.
The Scaling Properties of Implicit Deductive Reasoning in Transformers cs.AI · 2026-05-05 · unverdicted · none · ref 18
In deep Transformers using bidirectional prefix masks, implicit reasoning on Horn clauses matches explicit CoT performance across topologies and widths, but CoT is still required for depth extrapolation.

Gomez, Lukasz Kaiser, and Illia Polosukhin

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer