MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
Grapharena: Evaluating and exploring large language models on graph computation
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
UniGraphLM uses a multi-domain multi-task GNN encoder and adaptive alignment to create unified graph tokens for LLMs across diverse domains and tasks.
EGL-SCA co-evolves instructions and tools via structural credit assignment in graph reasoning agents and reports 92% average success on four benchmarks.
citing papers explorer
-
MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs
MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning
UniGraphLM uses a multi-domain multi-task GNN encoder and adaptive alignment to create unified graph tokens for LLMs across diverse domains and tasks.
-
EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
EGL-SCA co-evolves instructions and tools via structural credit assignment in graph reasoning agents and reports 92% average success on four benchmarks.