MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
Grapharena: Evaluating and exploring large language models on graph computation
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
FrontierOR benchmark shows frontier LLMs outperform Gurobi on solution quality and efficiency in only 31% of one-shot cases and 50% with test-time evolution on hard large-scale optimization tasks.
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.
UniGraphLM uses a multi-domain multi-task GNN encoder and adaptive alignment to create unified graph tokens for LLMs across diverse domains and tasks.
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
EGL-SCA co-evolves instructions and tools via structural credit assignment in graph reasoning agents and reports 92% average success on four benchmarks.
citing papers explorer
-
MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs
MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
-
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
FrontierOR benchmark shows frontier LLMs outperform Gurobi on solution quality and efficiency in only 31% of one-shot cases and 50% with test-time evolution on hard large-scale optimization tasks.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.
-
A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning
UniGraphLM uses a multi-domain multi-task GNN encoder and adaptive alignment to create unified graph tokens for LLMs across diverse domains and tasks.
-
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
-
EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
EGL-SCA co-evolves instructions and tools via structural credit assignment in graph reasoning agents and reports 92% average success on four benchmarks.