AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.
Llm4ad: A platform for algorithm design with large language model
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
LaF-MCTS uses LLM-assisted flexible MCTS with a three-tier hierarchy, semantic pruning, and branch regrowth to automatically compose decomposition-enhanced CVRP solvers that outperform state-of-the-art methods on CVRPLib benchmarks.
An LLM-powered agentic framework autonomously designs competitive and sometimes superior explainable algorithms for wireless PHY and MAC layer tasks.
EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior automated rewards.
citing papers explorer
-
AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design
AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.
-
Automated Large-scale CVRP Solver Design via LLM-assisted Flexible MCTS
LaF-MCTS uses LLM-assisted flexible MCTS with a three-tier hierarchy, semantic pruning, and branch regrowth to automatically compose decomposition-enhanced CVRP solvers that outperform state-of-the-art methods on CVRPLib benchmarks.
-
The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms
An LLM-powered agentic framework autonomously designs competitive and sometimes superior explainable algorithms for wireless PHY and MAC layer tasks.
-
EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models
EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior automated rewards.