An autonomous post-training system for a 30B model achieves near-top human performance on a reasoning leaderboard and revises its search policy after detecting that its dev metric had become misleading.
arXiv:1908.00709, 2019
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.
citing papers explorer
-
A-Evolve-Training: Autonomous Post-Training of a 30B Model
An autonomous post-training system for a 30B model achieves near-top human performance on a reasoning leaderboard and revises its search policy after detecting that its dev metric had become misleading.
-
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning
LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.