Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
Pith reviewed 2026-06-30 09:12 UTC · model grok-4.3
The pith
Evolution fine-tuning teaches language models reusable strategies for solving new optimization tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Evolution Fine-Tuning turns evolutionary search trajectories collected across 371 tasks into supervised signals that allow language models to internalize transferable strategies for mutation, backtracking, and iteration, producing measurable gains on unseen optimization problems from the same distribution.
What carries the argument
Evolution Fine-Tuning (EFT), a mid-training procedure that converts full evolutionary search trajectories into next-token prediction targets so the model learns cross-task evolution behavior.
If this is right
- Fine-tuned models improve by 10.22 percent on average over their untuned base versions across 22 held-out tasks.
- Paired with test-time reinforcement learning, the models reach state-of-the-art results on two circle-packing problems.
- The same models outperform their base counterparts on the Erdős minimum-overlap problem.
- Language models can function as reusable discovery agents that accumulate evolution experience rather than resetting for each new task.
Where Pith is reading between the lines
- The approach suggests that meta-level search policies can be separated from domain-specific knowledge and stored in model weights.
- Similar trajectory-based fine-tuning could be applied to other iterative methods such as gradient-based or Monte-Carlo search to test whether the benefit is specific to evolutionary scaffolds.
- If the dataset size grows, the same method might close gaps on long-standing open conjectures that currently require human-designed scaffolds.
Load-bearing premise
Trajectories produced by standard evolutionary scaffolds contain general signals about effective search steps that a model can extract and apply to entirely different tasks rather than memorizing task-specific patterns.
What would settle it
No average performance lift on a fresh collection of 20+ held-out optimization tasks drawn from domains outside the original 10 would falsify the claim that the fine-tuning produces transferable evolution skills.
read the original abstract
Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on optimization tasks, including open mathematical conjectures, GPU kernel design, scientific law discovery, and combinatorial puzzles. To achieve this, prior work applied search scaffolds to one target task at a time, so every new problem is approached from scratch and the experience accumulated during search is discarded once the model finishes its attempt. This leaves the capability of iteratively evolving a solution (e.g., knowing which part to mutate and how, deciding when to backtrack) entirely in the scaffold rather than in the model itself. Whether the model itself could acquire this capability and reuse it across different tasks has been largely unexamined. To address this, we introduce Evolution Fine-Tuning (EFT), a mid-training paradigm that teaches LLMs to evolve solutions across tasks by converting evolutionary search trajectories into supervision. We construct Finch Collection, a 156K-trajectory dataset spanning 10 domains and 371 optimization tasks, and fine-tune open-source LLMs from 2B to 9B parameters. Empirically, EFT confers cross-task generalization: across 22 held-out tasks, our models surpass their base counterparts by 10.22% on average. Furthermore, when paired with test-time RL, our model matches state-of-the-art performance on two circle-packing tasks and outperforms its base-model counterpart on the Erd\H{o}s minimum-overlap problem. EFT thus serves as a "practice phase" for general-purpose discovery agents that do not solve new problems from scratch.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Evolution Fine-Tuning (EFT), a mid-training method that converts 156K evolutionary search trajectories spanning 10 domains and 371 tasks (Finch Collection) into supervision for fine-tuning LLMs (2B–9B parameters). The central claim is that this teaches reusable evolutionary skills (mutation, backtracking, iteration) enabling cross-task generalization, with models surpassing base counterparts by 10.22% on average across 22 held-out tasks; when combined with test-time RL the fine-tuned models match SOTA on two circle-packing tasks and outperform the base model on the Erdős minimum-overlap problem.
Significance. If the central claim holds, the work would be significant for shifting iterative search capabilities from external scaffolds into the model itself, supporting more general-purpose discovery agents. The construction of a multi-domain trajectory dataset at this scale is a concrete contribution that could enable further research on strategy transfer.
major comments (2)
- [Abstract] Abstract: the 10.22% average gain on 22 held-out tasks is presented without any reported controls for domain overlap between the 10 training domains and the held-out tasks, statistical significance testing, or ablations that remove task-specific signals while retaining general evolutionary operators. This information is required to distinguish internalization of transferable strategies from memorization of patterns within the same domains.
- [Abstract and §4] Abstract and §4 (empirical evaluation): no description is given of performance measurement protocols, data exclusion rules, or how trajectories were filtered, making it impossible to assess whether the reported gains on held-out tasks and the circle-packing/Erdős results are robust or could arise from scaffold-specific artifacts.
minor comments (1)
- [Abstract] The abstract would benefit from a brief statement of the base model sizes and the exact held-out task domains to allow readers to gauge the degree of domain shift.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for greater transparency in our empirical claims and protocols. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Abstract] Abstract: the 10.22% average gain on 22 held-out tasks is presented without any reported controls for domain overlap between the 10 training domains and the held-out tasks, statistical significance testing, or ablations that remove task-specific signals while retaining general evolutionary operators. This information is required to distinguish internalization of transferable strategies from memorization of patterns within the same domains.
Authors: We agree the abstract omits these elements. The full manuscript (§3.2) selects the 22 held-out tasks from domains disjoint from the 10 training domains to reduce overlap, but we acknowledge this is not explicitly controlled or ablated in the reported results. In revision we will add to §4: (i) explicit documentation of domain disjointness, (ii) statistical significance testing (paired t-tests with p-values) on the 10.22% average improvement, and (iii) an ablation that retains general evolutionary operators while removing task-specific signals. These additions will directly address the distinction between strategy transfer and memorization. revision: yes
-
Referee: [Abstract and §4] Abstract and §4 (empirical evaluation): no description is given of performance measurement protocols, data exclusion rules, or how trajectories were filtered, making it impossible to assess whether the reported gains on held-out tasks and the circle-packing/Erdős results are robust or could arise from scaffold-specific artifacts.
Authors: The referee is correct that §4 lacks a consolidated description of these protocols. In the revision we will expand §4 with a dedicated subsection detailing: (i) performance measurement (success rate defined by objective improvement thresholds), (ii) data exclusion rules (e.g., discarding trajectories with syntax errors or non-convergent runs), and (iii) trajectory filtering criteria (minimum length, valid mutation rate, and convergence checks). This will enable assessment of robustness independent of scaffold artifacts. revision: yes
Circularity Check
No circularity: standard trajectory-supervised fine-tuning with held-out task evaluation
full rationale
The paper generates a 156K-trajectory dataset from evolutionary search scaffolds across 371 tasks in 10 domains, fine-tunes LLMs on this data, and reports average gains on 22 explicitly held-out tasks. This is a conventional train/test split in supervised learning; the held-out performance metric is not defined in terms of the training trajectories or scaffolds by construction. No equations, self-citations, or ansatzes are presented that reduce the central cross-task generalization claim to a tautology or fitted input. The load-bearing assumption (transferable strategies vs. task-specific patterns) is an empirical question tested by the held-out split rather than presupposed by the method itself.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Optimization by simulated annealing
Scott Kirkpatrick, C Daniel Gelatt Jr, and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983
1983
-
[2]
Some remarks on number theory.Riveon Lematematika, 9:45–48, 1955
Paul Erdős. Some remarks on number theory.Riveon Lematematika, 9:45–48, 1955
1955
-
[3]
A new bound for erdős’ minimum overlap problem.Acta Arithmetica, 208: 235–255, 2023
Ethan Patrick White. A new bound for erdős’ minimum overlap problem.Acta Arithmetica, 208: 235–255, 2023
2023
-
[4]
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang, Simon Guo, Simran Arora, Alex L Zhang, William Hu, Christopher Ré, and Azalia Mirhoseini. Kernelbench: Can llms write efficient gpu kernels?arXiv preprint arXiv:2502.10517, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, and Chandan K Reddy. Llm-srbench: A new benchmark for scientific equation discovery with large language models.arXiv preprint arXiv:2504.10415, 2025
-
[6]
Mathematicaldiscoveriesfromprogramsearchwithlargelanguagemodels.Nature,625(7995):468–475, 2024
BernardinoRomera-Paredes,MohammadaminBarekatain,AlexanderNovikov,MatejBalog,MPawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematicaldiscoveriesfromprogramsearchwithlargelanguagemodels.Nature,625(7995):468–475, 2024
2024
-
[7]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wag- ner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. Shinkaevolve: Towards open-ended and sample-efficient program evolution.arXiv preprint arXiv:2509.19349, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization
Henrique Assumpção, Diego Ferreira, Leandro Campos, and Fabricio Murai. Codeevolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie, Shuo Chen, Zhankui He, Noveen Sachdeva, Isabella Ye, Weili Wang, et al. Pacevolve: Enabling long-horizon progress-aware consistent evolution.arXiv preprint arXiv:2601.10657, 2026
-
[11]
Adaevolve: Adaptive llm driven zeroth-order optimization.arXiv preprint arXiv:2602.20133, 2026
Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, et al. Adaevolve: Adaptive llm driven zeroth-order optimization.arXiv preprint arXiv:2602.20133, 2026
-
[12]
Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G
Shu Liu, Shubham Agarwal, Monishwaran Maheswaran, Mert Cemri, Zhifei Li, Qiuyang Mang, AshwinNaren,EthanBoneh,AudreyCheng,MelissaZPan,etal. Evox: Meta-evolutionforautomated discovery.arXiv preprint arXiv:2602.23413, 2026
-
[13]
ThetaEvolve: Test-time Learning on Open Problems
Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, et al. Thetaevolve: Test-time learning on open problems.arXiv preprint arXiv:2511.23473, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Learning to Discover at Test Time
Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, et al. Learning to discover at test time.arXiv preprint arXiv:2601.16175, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[15]
OpenEvolve: An open-source evolutionary coding agent
Asankhaya Sharma. OpenEvolve: An open-source evolutionary coding agent. https://github.com/ algorithmicsuperintelligence/openevolve, 2025. GitHub repository
2025
-
[16]
Qwen3.5: Acceleratingproductivitywithnativemultimodalagents, February2026
QwenTeam. Qwen3.5: Acceleratingproductivitywithnativemultimodalagents, February2026. URL https://qwen.ai/blog?id=qwen3.5
-
[17]
Yuki Imajuku, Kohki Horie, Yoichi Iwata, Kensho Aoki, Naohiro Takahashi, and Takuya Akiba. Ale-bench: A benchmark for long-horizon objective-driven algorithm engineering.arXiv preprint arXiv:2506.09050, 2025. 13 Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
-
[18]
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
LakshyaAAgrawal, ShangyinTan, DilaraSoylu, NoahZiems, RishiKhare, KristaOpsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, and Tatiana Shavrina
Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, and Tatiana Shavrina. Hyperagents.arXiv preprint arXiv:2603.19461, 2026
-
[21]
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al. Coral: Towards autonomous multi-agent evolution for open-ended discovery.arXiv preprint arXiv:2604.01658, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[22]
Frontiercs: Evolving challenges for evolving intelligence
Qiuyang Mang, Wenhao Chai, Zhifei Li, Huanzhi Mao, Shang Zhou, Alexander Du, Hanchen Li, Shu Liu, Edwin Chen, Yichuan Wang, et al. Frontiercs: Evolving challenges for evolving intelligence. arXiv preprint arXiv:2512.15699, 2025
-
[23]
Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K Ainsworth, Dominik Krupke, Patrick Kidger, Touqir Sajed, Bartolomeo Stellato, Jisun Park, et al. Algotune: Can language models speed up general-purpose numerical programs?arXiv preprint arXiv:2507.15887, 2025
-
[24]
GPU MODE
GPU MODE. GPU MODE. https://www.gpumode.com/home, 2026. Accessed: 2026-05-03
2026
-
[25]
Malte D Luecken, Scott Gigante, Daniel B Burkhardt, Robrecht Cannoodt, Daniel C Strobl, Nikolay S Markov,LukeZappia,GiovanniPalla,WesleyLewis,DanielDimitrov,etal.Definingandbenchmarking open problems in single-cell analysis.Nature Biotechnology, 43(7):1035–1040, 2025
2025
-
[26]
Tony Feng, Trieu Trinh, Garrett Bingham, Jiwon Kang, Shengtong Zhang, Sang-hyun Kim, Kevin Barreto, Carl Schildkraut, Junehyuk Jung, Jaehyeon Seo, et al. Semi-autonomous mathematics discovery with gemini: A case study on the erd\h{o}s problems.arXiv preprint arXiv:2601.22401, 2026
-
[27]
Erdos problems
Thomas Bloom. Erdos problems. https://www.erdosproblems.com/, 2026. Accessed: 2026-05-03
2026
-
[28]
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization.arXiv preprint arXiv:2402.01306, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
Llamafactory: Unified efficient fine-tuning of 100+ language models
Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, and Zheyan Luo. Llamafactory: Unified efficient fine-tuning of 100+ language models. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 3: system demonstrations), pages 400–410, 2024
2024
-
[31]
Dimakis, Matei Zaharia, and Ion Stoica
Shu Liu, Mert Cemri, Shubham Agarwal, Alexander Krentsel, Ashwin Naren, Qiuyang Mang, Zhifei Li, Akshat Gupta, Monishwaran Maheswaran, Audrey Cheng, Melissa Pan, Ethan Boneh, Kannan Ramchandran, Koushik Sen, Alexandros G. Dimakis, Matei Zaharia, and Ion Stoica. Skydiscover: A flexibleframeworkforai-drivenscientificandalgorithmicdiscovery,2026. URLhttps://...
2026
-
[32]
Evaluation-driven Scaling for Scientific Discovery
Haotian Ye, Haowei Lin, Jingyi Tang, Yizhen Luo, Caiyin Yang, Chang Su, Rahul Thapa, Rui Yang, Ruihua Liu, Zeyu Li, et al. Evaluation-driven scaling for scientific discovery.arXiv preprint arXiv:2604.19341, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[33]
Gang Liao, Hongsen Qin, Ying Wang, Alicia Golden, Michael Kuchnik, Yavuz Yetim, Jia Jiunn Ang, Chunli Fu, Yihan He, Samuel Hsia, et al. Kernelevolve: Scaling agentic kernel coding for heterogeneous ai accelerators at meta.arXiv preprint arXiv:2512.23236, 2025
-
[34]
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, et al. Kernel-smith: A unified recipe for evolutionary kernel optimization. arXiv preprint arXiv:2603.28342, 2026. 14 Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[35]
Meta-Harness: End-to-End Optimization of Model Harnesses
Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-harness: End-to-end optimization of model harnesses.arXiv preprint arXiv:2603.28052, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[36]
Manish Shetty, Naman Jain, Jinjian Liu, Vijay Kethanaboyina, Koushik Sen, and Ion Stoica. Gso: Challenging software optimization tasks for evaluating swe-agents.arXiv preprint arXiv:2505.23671, 2025
-
[37]
Autolab: Can models begin to participate in the loops that drive scientific and engineering progress?, 2026
AutoLab Team. Autolab: Can models begin to participate in the loops that drive scientific and engineering progress?, 2026. URL https://github.com/autolabhq/autolab
2026
-
[38]
Can language models discover scaling laws?arXiv preprint arXiv:2507.21184, 2025
HaoweiLin, HaotianYe, WenzhengFeng, QuzheHuang, YujunLi, HubertLim, ZhengruiLi, Xiangyu Wang, Jianzhu Ma, Yitao Liang, et al. Can language models discover scaling laws?arXiv preprint arXiv:2507.21184, 2025
-
[39]
Theflancollection: Designingdataandmethodsforeffectiveinstruction tuning
Shayne Longpre, LeHou, TuVu, AlbertWebson, HyungWonChung, YiTay, DennyZhou, Quoc VLe, BarretZoph,JasonWei,etal. Theflancollection: Designingdataandmethodsforeffectiveinstruction tuning. InInternational conference on machine learning, pages 22631–22648. PMLR, 2023
2023
-
[40]
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, et al. Scienceboard: Evaluating multimodal autonomous agents in realistic scientific workflows.arXiv preprint arXiv:2505.19897, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Orion: Towards lab automation with computer-using agents.bioRxiv, pages 2026–06, 2026
Chang Ma, Linh Trinh, Matt Bucci, Aviv Regev, and Hanchen Wang. Orion: Towards lab automation with computer-using agents.bioRxiv, pages 2026–06, 2026
2026
-
[42]
Collavo: Crayonlargelanguage andvisionmodel
Byung-KwanLee,BeomchanPark,ChaeWonKim,andYongManRo. Collavo: Crayonlargelanguage andvisionmodel. InFindingsoftheAssociationforComputationalLinguistics: ACL2024,pages1121–1138, 2024
2024
-
[43]
Moai: Mixtureofallintelligence for large language and vision models
Byung-KwanLee,BeomchanPark,ChaeWonKim,andYongManRo. Moai: Mixtureofallintelligence for large language and vision models. InEuropean Conference on Computer Vision, pages 273–302. Springer, 2024
2024
-
[44]
Meteor: Mamba-basedtraversal of rationale for large language and vision models.Advances in Neural Information Processing Systems, 37:40278–40315, 2024
Byung-KwanLee,ChaeWonKim,BeomchanPark,andYongManRo. Meteor: Mamba-basedtraversal of rationale for large language and vision models.Advances in Neural Information Processing Systems, 37:40278–40315, 2024
2024
-
[45]
Phantom of latent for large language and vision models.arXiv preprint arXiv:2409.14713, 2024
Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, and Yong Man Ro. Phantom of latent for large language and vision models.arXiv preprint arXiv:2409.14713, 2024
-
[46]
Trol: Traversal of layers for large language and vision models.arXiv preprint arXiv:2406.12246, 2024
Byung-KwanLee,SangyunChung,ChaeWonKim,BeomchanPark,andYongManRo. Trol: Traversal of layers for large language and vision models.arXiv preprint arXiv:2406.12246, 2024
-
[47]
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Yu-Chiang Frank Wang, and Yueh-Hua Wu. Genrecal: Generation after recalibration from large to small vision-language models.arXiv preprint arXiv:2506.15681, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
Vlsi: Verbalized layers-to-interactions from large to small vision language models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang, Yong Man Ro, and Yueh-Hua Wu. Vlsi: Verbalized layers-to-interactions from large to small vision language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 29545–29557, 2025
2025
-
[49]
Unified reinforce- ment and imitation learning for vision-language models.Advances in Neural Information Processing Systems, 38:156508–156534, 2026
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Frank Wang, and Yueh-Hua Wu. Unified reinforce- ment and imitation learning for vision-language models.Advances in Neural Information Processing Systems, 38:156508–156534, 2026
2026
-
[50]
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
Minki Kang, Shizhe Diao, Ryo Hachiuma, Sung Ju Hwang, Pavlo Molchanov, Yu-Chiang Frank Wang, and Byung-Kwan Lee. Agent explorative policy optimization for multimodal agentic reasoning.arXiv preprint arXiv:2605.28774, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[51]
Masking teacher and reinforcing student for distilling vision-language models
Byung-Kwan Lee, Yu-Chiang Frank Wang, and Ryo Hachiuma. Masking teacher and reinforcing student for distilling vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10126–10141, 2026. 15 Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
2026
-
[52]
Recursive think-answer process for llms and vlms
Byung-Kwan Lee, Youngchae Chee, and Yong Man Ro. Recursive think-answer process for llms and vlms. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9608–9621, 2026
2026
-
[53]
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su, Byung-Kwan Lee, Chan Hee Song, Sifei Liu, Subhashree Radhakrishnan, Seungryong Kim, Yu-Chiang Frank Wang, et al. Spatialclaw: Rethinking action interface for agentic spatial reasoning.arXiv preprint arXiv:2606.13673, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[54]
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
Seonghoon Yu, Dongjun Nam, Byung-Kwan Lee, and Jeany Son. Hide to see: Reasoning-prefix masking for visual-anchored thinking in vlm distillation.arXiv preprint arXiv:2605.11651, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[55]
Jiwan Kim, Kibum Kim, Wonjoong Kim, Byung-Kwan Lee, and Chanyoung Park. Why and when visual token pruning fails? a study on relevant visual information shift in mllms decoding.arXiv preprint arXiv:2604.12358, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[56]
Dialogcc: An automated pipeline for creating high-quality multi-modal dialogue dataset
Young-Jun Lee, Byungsoo Ko, Han-Gyu Kim, Jonghwan Hyeon, and Ho-Jin Choi. Dialogcc: An automated pipeline for creating high-quality multi-modal dialogue dataset. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1938–1963, 2024
2024
-
[57]
Stark: Social long-term multi-modal conversation with persona commonsense knowledge
Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeong-Jin Oh, Byungsoo Ko, Jonghwan Hyeon, and Ho-Jin Choi. Stark: Social long-term multi-modal conversation with persona commonsense knowledge. InFindingsoftheAssociationforComputationalLinguistics: EMNLP2024,pages12137–12162, 2024
2024
-
[58]
Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, and Ho-Jin Choi. Thanos: Enhancing conversationalagentswithskill-of-mind-infusedlargelanguagemodel.arXivpreprintarXiv:2411.04496, 2024
-
[59]
Large language models can share images, too! InFindings of the Association for Computational Linguistics: ACL 2024, pages 692–713, 2024
Young-Jun Lee, Dokyong Lee, Joo-won Sung, Jonghwan Hyeon, and Ho-Jin Choi. Large language models can share images, too! InFindings of the Association for Computational Linguistics: ACL 2024, pages 692–713, 2024
2024
-
[60]
Multiverse: A multi-turn conversation benchmarkforevaluatinglargevisionandlanguagemodels
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang, Yechan Hwang, Byungsoo Ko, Han-Gyu Kim, Dongyu Yao, Xuankun Rong, Eojin Joo, Seung-Ho Han, et al. Multiverse: A multi-turn conversation benchmarkforevaluatinglargevisionandlanguagemodels. InProceedingsoftheIEEE/CVFInternational Conference on Computer Vision, pages 708–719, 2025
2025
-
[61]
Young-Jun Lee, Seungone Kim, Byung-Kwan Lee, Minkyeong Moon, Yechan Hwang, Jong Myoung Kim, Graham Neubig, Sean Welleck, and Ho-Jin Choi. Refinebench: Evaluating refinement capability of language models via checklists.arXiv preprint arXiv:2511.22173, 2025
-
[62]
On the origin of species
Charles Darwin. On the origin of species. InScientific Methodology in Nineteenth Century Britain, pages 133–181. Routledge, 2025
2025
-
[63]
Unpredictable evolution in a 30-year study of darwin’s finches
Peter R Grant and B Rosemary Grant. Unpredictable evolution in a 30-year study of darwin’s finches. science, 296(5568):707–711, 2002
2002
-
[64]
C1 mismatch: reported X, computed Y
Lev Vygotsky et al.Interaction between learning and development. Linköpings universitet Linköping, Sweden, 2011. 16 Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks A. Broader Impacts EFT democratizes LLM-driven discovery by transferring optimization capabilities from expensive proprietary models to small open-weight models, reduc...
2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.