GraphARC is a scalable benchmark for few-shot graph transformation learning that exposes a comprehension-execution gap in language models on abstract reasoning tasks.
How do large language models understand graph patterns? a benchmark for graph pattern comprehension
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference tokens and strong cross-domain transfer.
GTokenLLMs do not fully understand graph tokens, exhibiting over-sensitivity or insensitivity to instruction changes and relying heavily on text for reasoning even when graph information is preserved.
The survey organizes Context Engineering into retrieval, processing, management, and integrated systems like RAG and multi-agent setups while identifying an asymmetry where LLMs handle complex inputs well but struggle with equally sophisticated long outputs.
citing papers explorer
-
GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning
GraphARC is a scalable benchmark for few-shot graph transformation learning that exposes a comprehension-execution gap in language models on abstract reasoning tasks.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference tokens and strong cross-domain transfer.
-
Revisiting Graph-Tokenizing Large Language Models: A Systematic Evaluation of Graph Token Understanding
GTokenLLMs do not fully understand graph tokens, exhibiting over-sensitivity or insensitivity to instruction changes and relying heavily on text for reasoning even when graph information is preserved.
-
A Survey of Context Engineering for Large Language Models
The survey organizes Context Engineering into retrieval, processing, management, and integrated systems like RAG and multi-agent setups while identifying an asymmetry where LLMs handle complex inputs well but struggle with equally sophisticated long outputs.