FrontierOR benchmark shows frontier LLMs outperform Gurobi on solution quality and efficiency in only 31% of one-shot cases and 50% with test-time evolution on hard large-scale optimization tasks.
arXiv preprint arXiv:2505.16952 (2025)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
dataset 1polarities
background 1representative citing papers
NLCO benchmark shows LLMs achieve reasonable feasibility on small natural-language CO tasks but degrade on larger instances, with set-based problems easier than graph-structured or bottleneck-objective ones.
CAM is an unsupervised training method for discrete diffusion models on combinatorial optimization problems that uses discrete adjoint dynamics to supply low-variance trajectory-level signals.
A survey compiling roles, applications, benchmarks, challenges, and future directions for large language models in operations research.
citing papers explorer
-
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
FrontierOR benchmark shows frontier LLMs outperform Gurobi on solution quality and efficiency in only 31% of one-shot cases and 50% with test-time evolution on hard large-scale optimization tasks.
-
Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization
NLCO benchmark shows LLMs achieve reasonable feasibility on small natural-language CO tasks but degrade on larger instances, with set-based problems easier than graph-structured or bottleneck-objective ones.