GraphARC is a scalable benchmark for few-shot graph transformation learning that exposes a comprehension-execution gap in language models on abstract reasoning tasks.
How do large language models understand graph patterns? a benchmark for graph pattern comprehension
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference tokens and strong cross-domain transfer.
GTokenLLMs do not fully understand graph tokens, exhibiting over-sensitivity or insensitivity to instruction changes and relying heavily on text for reasoning even when graph information is preserved.
The survey organizes Context Engineering into retrieval, processing, management, and integrated systems like RAG and multi-agent setups while identifying an asymmetry where LLMs handle complex inputs well but struggle with equally sophisticated long outputs.
citing papers explorer
-
GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning
GraphARC is a scalable benchmark for few-shot graph transformation learning that exposes a comprehension-execution gap in language models on abstract reasoning tasks.
-
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference tokens and strong cross-domain transfer.