Niewiadomski and Piotr Nyczyk and Torsten Hoefler , year =

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi · 2024 · arXiv 2308.09687

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

cs.AI · 2023-10-06 · unverdicted · novelty 7.0

LATS integrates Monte Carlo Tree Search with language models using in-context learning, value functions, and self-reflection to achieve 92.7% pass@1 on HumanEval and competitive web navigation performance.

Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

TRI trains LLMs on goal-conditioned fill-in-the-middle tasks via PSM token rearrangement and symbolic verification to surgically repair erroneous CoT segments.

OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows

cs.SE · 2026-04-25 · unverdicted · novelty 6.0

MOSAIC generates executable scientific code without I/O test cases by combining student-teacher distillation with a consolidated context window to reduce hallucinations across subproblems.

MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding

cs.CL · 2025-10-09 · unverdicted · novelty 6.0

MOSAIC is a training-free multi-agent LLM framework with rationale, coding, reflection, and debugging agents plus a consolidated context window that outperforms prior methods on scientific coding benchmarks.

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

cs.AI · 2025-07-28 · unverdicted · novelty 6.0

GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

cs.CL · 2024-10-30 · unverdicted · novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

A Survey on Large Language Model based Autonomous Agents

cs.AI · 2023-08-22 · accept · novelty 6.0

A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.

AlgoSkill: Learning to Design Algorithms by Scheduling Human-Like Skills

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

AlgoSkill improves LLM algorithm design on programming benchmarks by framing it as verification-guided scheduling over a typed skill library with MCTS, outperforming direct generation and self-refinement.

PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement

cs.AI · 2026-06-21 · unverdicted · novelty 5.0

PAPERCLAW is a multi-agent system for end-to-end autonomous research paper generation from literature to output, with human refinement and LLM-judge evaluation showing strong results.

Runtime-Structured Task Decomposition for Agentic Coding Systems

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

Runtime-structured task decomposition reduces retry costs in agentic coding systems by up to 51.7% versus monolithic prompts by rerunning only failed subtasks on two software engineering workloads.

State Representation and Termination for Recursive Reasoning Systems

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

Recursive reasoning systems can represent their state via an epistemic state graph and terminate when the linearized order-gap is non-degenerate near the fixed point, providing a local condition for when the stopping rule is informative.

Understanding the planning of LLM agents: A survey

cs.AI · 2024-02-05 · accept · novelty 4.0

A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.

LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems

cs.LG · 2026-01-20 · unverdicted · novelty 3.0

A survey taxonomy of LLMs identifies three scaling crises and six efficiency paradigms while tracing the shift from generation to tool-using agents.

LLM Multi-Agent Systems: Challenges and Open Problems

cs.MA · 2024-02-05 · unverdicted · novelty 2.0

The paper identifies inadequately addressed challenges in optimizing task allocation, fostering robust reasoning through debates, managing layered context, enhancing memory, and applying multi-agent systems to blockchain.

citing papers explorer

Showing 15 of 15 citing papers.

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models cs.AI · 2023-10-06 · unverdicted · none · ref 1
LATS integrates Monte Carlo Tree Search with language models using in-context learning, value functions, and self-reflection to achieve 92.7% pass@1 on HumanEval and competitive web navigation performance.
Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair cs.CL · 2026-06-03 · unverdicted · none · ref 2
TRI trains LLMs on goal-conditioned fill-in-the-middle tasks via PSM token rearrangement and symbolic verification to surgically repair erroneous CoT segments.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces cs.AI · 2026-05-09 · unverdicted · none · ref 94
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows cs.SE · 2026-04-25 · unverdicted · none · ref 4
MOSAIC generates executable scientific code without I/O test cases by combining student-teacher distillation with a consolidated context window to reduce hallucinations across subproblems.
MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding cs.CL · 2025-10-09 · unverdicted · none · ref 25
MOSAIC is a training-free multi-agent LLM framework with rationale, coding, reflection, and debugging agents plus a consolidated context window that outperforms prior methods on scientific coding benchmarks.
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis cs.AI · 2025-07-28 · unverdicted · none · ref 10
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents cs.CL · 2024-10-30 · unverdicted · none · ref 82
OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
A Survey on Large Language Model based Autonomous Agents cs.AI · 2023-08-22 · accept · none · ref 54
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
AlgoSkill: Learning to Design Algorithms by Scheduling Human-Like Skills cs.AI · 2026-06-29 · unverdicted · none · ref 14
AlgoSkill improves LLM algorithm design on programming benchmarks by framing it as verification-guided scheduling over a typed skill library with MCTS, outperforming direct generation and self-refinement.
PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement cs.AI · 2026-06-21 · unverdicted · none · ref 19
PAPERCLAW is a multi-agent system for end-to-end autonomous research paper generation from literature to output, with human refinement and LLM-judge evaluation showing strong results.
Runtime-Structured Task Decomposition for Agentic Coding Systems cs.SE · 2026-05-14 · unverdicted · none · ref 19
Runtime-structured task decomposition reduces retry costs in agentic coding systems by up to 51.7% versus monolithic prompts by rerunning only failed subtasks on two software engineering workloads.
State Representation and Termination for Recursive Reasoning Systems cs.AI · 2026-05-02 · unverdicted · none · ref 4
Recursive reasoning systems can represent their state via an epistemic state graph and terminate when the linearized order-gap is non-degenerate near the fixed point, providing a local condition for when the stopping rule is informative.
Understanding the planning of LLM agents: A survey cs.AI · 2024-02-05 · accept · none · ref 3
A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems cs.LG · 2026-01-20 · unverdicted · none · ref 16
A survey taxonomy of LLMs identifies three scaling crises and six efficiency paradigms while tracing the shift from generation to tool-using agents.
LLM Multi-Agent Systems: Challenges and Open Problems cs.MA · 2024-02-05 · unverdicted · none · ref 38
The paper identifies inadequately addressed challenges in optimizing task allocation, fostering robust reasoning through debates, managing layered context, enhancing memory, and applying multi-agent systems to blockchain.

Niewiadomski and Piotr Nyczyk and Torsten Hoefler , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer