Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Amir Globerson; Clayton Sanford; Gilad Yehudai; Maya Bechler-Speicher; Orr Fischer; Ran Gilad-Bachrach

arxiv: 2503.01805 · v3 · pith:7NOZ6JVNnew · submitted 2025-03-03 · 💻 cs.LG · cs.AI· cs.CL

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Gilad Yehudai , Clayton Sanford , Maya Bechler-Speicher , Orr Fischer , Ran Gilad-Bachrach , Amir Globerson This is my paper

classification 💻 cs.LG cs.AIcs.CL

keywords widthtasksdepthgraph-basedalgorithmicmodelsproblemscomplex

0 comments

read the original abstract

Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement the task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly, while depth is kept fixed. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference and train time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We empirically investigate these trade-offs between the relative powers of depth and width and find tasks where wider models have the same accuracy as deep models, while having much faster train and inference time due to parallelizable hardware.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
cs.LG 2026-05 accept novelty 8.0

Graph tokenizations for Transformers induce distinct depth regimes with proven separations and impossibility results for converting between them at limited depth.
Learning to Approximate Uniform Facility Location via Graph Neural Networks
cs.LG 2026-02 unverdicted novelty 6.0

A differentiable MPNN approximates uniform facility location with provable guarantees and outperforms standard approximation algorithms while closing the gap to exact ILP solutions.
Deep sequence models tend to memorize geometrically; it is unclear why
cs.LG 2025-10 unverdicted novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.