Frontier LLMs still struggle with simple reasoning tasks

Alan Malek, Jiawei Ge, Nevena Lazic, Chi Jin, Andr´ as Gy¨ orgy, Csaba Szepesv´ ari · 2025 · arXiv 2507.07313

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Certification from Examples is Hard for Circuits and Transformers under Minimal Overparametrization

cs.LG · 2026-05-21 · unverdicted · novelty 8.0

Minimal overparametrization makes exact certification from examples exponentially hard for depth-2 threshold circuits and log-precision Transformers.

Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions

cs.CL · 2026-04-08 · unverdicted · novelty 8.0

A nine-dimension algebraic complexity framework shows that LLMs suffer a scale-invariant working memory bottleneck, collapsing at 20-30 parallel branches regardless of parameter count from 8B to 235B.

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

cs.AI · 2026-04-23 · conditional · novelty 7.0

Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.

Beyond representational alignment with brain-guided language models for robust reasoning

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Task-evoked brain signals enhance LLM reasoning performance via representation steering at inference and fine-tuning, yielding up to 13 percent accuracy gains orthogonal to language supervision.

citing papers explorer

Showing 5 of 5 citing papers.

Certification from Examples is Hard for Circuits and Transformers under Minimal Overparametrization cs.LG · 2026-05-21 · unverdicted · none · ref 20
Minimal overparametrization makes exact certification from examples exponentially hard for depth-2 threshold circuits and log-precision Transformers.
Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions cs.CL · 2026-04-08 · unverdicted · none · ref 5
A nine-dimension algebraic complexity framework shows that LLMs suffer a scale-invariant working memory bottleneck, collapsing at 20-30 parallel branches regardless of parameter count from 8B to 235B.
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition cs.CL · 2026-05-12 · unverdicted · none · ref 63
DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning cs.AI · 2026-04-23 · conditional · none · ref 10
Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.
Beyond representational alignment with brain-guided language models for robust reasoning cs.LG · 2026-06-10 · unverdicted · none · ref 15
Task-evoked brain signals enhance LLM reasoning performance via representation steering at inference and fine-tuning, yielding up to 13 percent accuracy gains orthogonal to language supervision.

Frontier LLMs still struggle with simple reasoning tasks

fields

years

verdicts

representative citing papers

citing papers explorer