pith. sign in

Gomez, Lukasz Kaiser, and Illia Polosukhin

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 4

verdicts

UNVERDICTED 4

roles

background 1

polarities

background 1

representative citing papers

On the Architectural Complexity of Neural Networks

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

A framework quantifies DNN complexity via tensor operations, links 40 years of breakthroughs to complexity increases, and releases a dataset of 3000+ unexplored high-complexity architectures.

Gaussian Relational Graph Transformer

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

GelGT proposes collaborative sampling and Gaussian attention on subgraphs to model long-range structural, semantic, and temporal dependencies in relational graphs, reporting up to 13.8% gains on downstream tasks.

citing papers explorer

Showing 4 of 4 citing papers.

  • Learning to Focus Synthetic Aperture Radar On-line with State-Space Models eess.IV · 2026-05-11 · unverdicted · none · ref 21

    An online SAR focusing framework using state-space models processes raw data line-by-line with 70x lower latency and 130x lower memory than block-based DSP while supporting downstream tasks.

  • On the Architectural Complexity of Neural Networks cs.LG · 2026-05-05 · unverdicted · none · ref 45

    A framework quantifies DNN complexity via tensor operations, links 40 years of breakthroughs to complexity increases, and releases a dataset of 3000+ unexplored high-complexity architectures.

  • Gaussian Relational Graph Transformer cs.LG · 2026-05-15 · unverdicted · none · ref 28

    GelGT proposes collaborative sampling and Gaussian attention on subgraphs to model long-range structural, semantic, and temporal dependencies in relational graphs, reporting up to 13.8% gains on downstream tasks.

  • The Scaling Properties of Implicit Deductive Reasoning in Transformers cs.AI · 2026-05-05 · unverdicted · none · ref 18

    In deep Transformers using bidirectional prefix masks, implicit reasoning on Horn clauses matches explicit CoT performance across topologies and widths, but CoT is still required for depth extrapolation.