Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.
Refinebench: Evaluating refinement capability of language models via checklists.arXiv preprint arXiv:2511.22173, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.
citing papers explorer
-
Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.
-
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.