arxiv: 2605.10770 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: no theorem link

DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures

Eleonora Gualdoni , Sonia Laguna , Louis Bethune , Joao Monteiro , Pierre Ablin , Marco Cuturi

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:27 UTC · model grok-4.3

classification 💻 cs.LG

keywords large language modelsfine-tuningdata mixingconstrained optimizationdynamic mixturesmulti-domain learningperformance preservation

0 comments

The pith

DynaMiCS optimizes mixture weights during LLM fine-tuning by estimating cross-domain effect slopes from short probes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

DynaMiCS frames multi-domain fine-tuning as a constrained optimization task over data mixture weights. At each training step it runs brief domain-specific probes to build a slope matrix that records how updates on one dataset locally affect loss on every evaluation domain. These slopes feed into a simplex-constrained optimizer that selects weights to raise target-domain performance while holding constrained-domain losses below chosen reference levels. The method reports stronger target gains and better constraint adherence than fixed-mixture baselines across varied numbers of target and constrained domains. It does so without reference models, per-example scoring, or hand-tuned weights and at reduced compute.

Core claim

By estimating a matrix of local cross-domain effects via short probing runs and solving a constrained optimization problem over the probability simplex, DynaMiCS produces dynamic mixture weights that improve target-domain metrics while keeping constrained-domain losses below reference thresholds, outperforming static baselines in multi-domain scenarios.

What carries the argument

The slope matrix of local cross-domain effects, built from short domain-specific probing runs and used to solve for mixture weights that maximize target improvement subject to constraint bounds.

If this is right

Target-domain improvements exceed those of fixed-mixture baselines while constrained-domain losses stay within reference limits.
The approach requires only short probes rather than full reference models or per-example scoring.
Mixture weights are computed automatically without manual tuning at each update step.
The method scales to varying numbers of target and constrained domains with lower overall compute than alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The probing-plus-optimization pattern could extend to other settings where local sensitivity estimates substitute for expensive full retraining.
If the slope matrix remains stable across longer horizons, the same machinery might support online adaptation of mixtures during continual learning.
The constrained simplex solve could be replaced by faster heuristics if the number of domains grows large, provided the slope structure stays low-rank.

Load-bearing premise

Slope estimates from short probing runs reliably predict the cross-domain performance changes that full-length training on the chosen mixture weights will produce.

What would settle it

Run full-length training with the weights chosen by DynaMiCS and observe that the measured target gains or constraint violations deviate substantially from the values predicted by the probe-derived slope matrix.

read the original abstract

Multi-domain fine-tuning of large language models requires improving performance on target domains while preserving performance on constrained domains, such as general knowledge, instruction following, or safety evaluations. Existing data mixing strategies rely on fixed heuristics or adaptive rules that cannot explicitly enforce preservation of such capabilities. We propose DynaMiCS, a dynamic mixture optimizer that casts multi-domain fine-tuning as a constrained optimization problem. At each update, DynaMiCS performs short domain-specific probing runs to estimate a slope matrix of local cross-domain effects, capturing how training on each fine-tuning dataset affects each evaluation domain. These estimates are then used to compute mixture weights through optimization over the probability simplex, with the objective of improving target-domain performance while keeping constrained-domain losses below reference levels. Across multi-domain fine-tuning scenarios with varying numbers of target and constrained domains, DynaMiCS achieves stronger target-domain improvements and higher constraint satisfaction than fixed-mixture baselines, at lower computational cost and without reference models, per-example scoring, or manually tuned mixture weights.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DynaMiCS uses short probes to estimate cross-domain slopes then optimizes mixtures on the simplex, which is a practical step beyond fixed heuristics but rests on an untested linearity assumption.

read the letter

The main takeaway is that DynaMiCS casts data mixing as a constrained optimization problem where short domain-specific probes build a slope matrix of local cross-domain effects, and those estimates feed into simplex optimization to raise target performance while holding constrained domains like safety or general knowledge below reference loss levels. This is new relative to the fixed-heuristic and adaptive-rule baselines in the abstract because it explicitly solves for weights that satisfy the constraints at each update without reference models or per-example scoring. The approach looks useful on paper for multi-domain fine-tuning where you have varying numbers of target and constrained domains, and the abstract reports stronger target gains plus higher constraint satisfaction at lower cost than the baselines. That framing directly tackles a deployment issue many people face when they want specialized improvement without collateral damage. The soft spot is the core assumption that the short-probe slopes will continue to predict behavior over full training runs. If loss surfaces are non-linear or inter-domain interactions shift as training progresses, the weights chosen early could let constrained losses drift above the limits later, which would erase the claimed advantages. The abstract states the results but supplies no numbers, error bars, or ablation details on probe length or linearity checks, so the strength of the evidence is hard to judge from what is shown. This paper is aimed at practitioners and researchers who fine-tune LLMs on mixed data and need to preserve certain capabilities. Someone working on data selection or constrained optimization in ML would get concrete ideas to try or adapt. It deserves peer review because the problem is real, the method is clearly motivated, and the experimental claims are specific enough that referees can check the linearity assumption and the actual effect sizes.

Referee Report

2 major / 2 minor

Summary. The paper proposes DynaMiCS, a dynamic mixture optimizer for multi-domain LLM fine-tuning. It casts the problem as constrained optimization: at each update, short domain-specific probing runs estimate a slope matrix capturing local cross-domain loss effects; these estimates are used to solve for mixture weights on the probability simplex that improve target-domain performance while keeping constrained-domain losses below reference levels. The method is claimed to deliver stronger target improvements and higher constraint satisfaction than fixed-mixture baselines across scenarios with varying numbers of target and constrained domains, at lower cost and without reference models, per-example scoring, or manual weight tuning.

Significance. If the empirical claims hold and the linear approximation remains valid, the approach would supply a principled, low-overhead way to enforce capability preservation during multi-domain adaptation, addressing a practical gap left by heuristic or gradient-based mixing strategies.

major comments (2)

[Method (probing and optimization steps)] The core procedure relies on the assumption that slope estimates from short probing runs remain predictive over full-length training trajectories (see the method description of the slope-matrix estimation and subsequent simplex optimization). Non-linear loss dynamics, domain interactions, or saturation effects would cause the computed weights to violate the intended constraints, directly undermining the reported gains in target improvement and constraint satisfaction. No ablation or diagnostic is described that tests the validity of this local-linear approximation.
[Experiments / Results] The abstract asserts superior results and constraint satisfaction, yet the manuscript supplies no quantitative tables, error bars, dataset details, ablation studies, or statistical tests. Without these, the central empirical claim cannot be evaluated for effect size, reproducibility, or robustness to the number of domains.

minor comments (2)

[Method] Clarify the precise formulation of the constrained optimization (objective, reference levels, and solver) and how the slope matrix is normalized or regularized.
[Related Work] Add explicit comparison to recent adaptive mixing baselines that also avoid reference models (e.g., gradient-based or meta-learning approaches) to better situate the novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment below and outline specific revisions that will strengthen the manuscript's clarity and empirical support.

read point-by-point responses

Referee: [Method (probing and optimization steps)] The core procedure relies on the assumption that slope estimates from short probing runs remain predictive over full-length training trajectories (see the method description of the slope-matrix estimation and subsequent simplex optimization). Non-linear loss dynamics, domain interactions, or saturation effects would cause the computed weights to violate the intended constraints, directly undermining the reported gains in target improvement and constraint satisfaction. No ablation or diagnostic is described that tests the validity of this local-linear approximation.

Authors: We agree that the local-linear approximation is a central assumption whose validity merits explicit validation. The probing runs are intended to capture instantaneous cross-domain effects at each optimization step, but we did not include a diagnostic comparing predicted versus observed trajectories in the original submission. In the revised manuscript we will add a new subsection with an ablation that (i) records actual loss changes over 5-10x longer intervals following each probing step and (ii) reports Pearson correlation and mean absolute error between slope-predicted and observed deltas across all domains. This will quantify the approximation's accuracy and highlight any regimes where non-linearity becomes problematic. revision: yes
Referee: [Experiments / Results] The abstract asserts superior results and constraint satisfaction, yet the manuscript supplies no quantitative tables, error bars, dataset details, ablation studies, or statistical tests. Without these, the central empirical claim cannot be evaluated for effect size, reproducibility, or robustness to the number of domains.

Authors: We acknowledge that the experimental presentation in the submitted version was insufficiently detailed for independent evaluation. The full manuscript contains comparative plots, but we agree that tabulated metrics, variability measures, and statistical tests were omitted. In the revision we will (i) add a main-results table reporting mean target-domain improvement and constraint-violation rates with standard deviations over at least three random seeds, (ii) expand the appendix with complete dataset statistics and hyper-parameter settings, (iii) include an ablation varying the number of target and constrained domains from 2 to 8, and (iv) report paired t-test p-values against the strongest baseline for each metric. These additions will allow readers to assess effect sizes and robustness directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical probing estimates are independent inputs

full rationale

The paper's core procedure estimates a slope matrix via separate short domain-specific probing runs and then solves a constrained optimization over the simplex to select mixture weights. These probing estimates constitute external data collected before the full training run; they are not fitted parameters from the target optimization that are later relabeled as predictions, nor are they defined in terms of the final performance metrics. No equations or self-citations are presented that would make any load-bearing claim reduce to its own inputs by construction. The method therefore remains a standard empirical approximation whose validity rests on the (falsifiable) assumption that local slopes extrapolate, rather than on definitional equivalence or circular self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are stated in the abstract; the method implicitly assumes that local linear slopes from short probes generalize to full training trajectories.

pith-pipeline@v0.9.0 · 5491 in / 1062 out tokens · 46395 ms · 2026-05-12T04:27:33.188410+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

, citeulike-article-id =

McCloskey, Michael and Cohen, Neil J. , citeulike-article-id =. Catastrophic Interference in Connectionist Networks:. The Psychology of Learning and Motivation , keywords =

work page
[2]

Continual Learning of Large Language Models: A Comprehensive Survey , url =

Haizhou Shi and Zihao Xu and Hengyi Wang and Weiyi Qin and Wenyuan Wang and Yibin Wang and Zifeng Wang and Sayna Ebrahimi and Hao Wang , journal =. Continual Learning of Large Language Models: A Comprehensive Survey , url =

work page
[3]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

Kirkpatrick, James and Pascanu, Razvan and Rabinowitz, Neil and Veness, Joel and Desjardins, Guillaume and Rusu, Andrei A. and Milan, Kieran and Quan, John and Ramalho, Tiago and Grabska-Barwinska, Agnieszka and Hassabis, Demis and Clopath, Claudia and Kumaran, Dharshan and Hadsell, Raia , year=. Overcoming catastrophic forgetting in neural networks , vol...

work page doi:10.1073/pnas.1611835114
[4]

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance , year =

Ye, Jiasheng and Liu, Peiju and Sun, Tianxiang and Zhan, Jun and Zhou, Yunhua and Qiu, Xipeng , booktitle =. Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance , year =

work page
[5]

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods , year =

Zhao, Wanru and Chen, Yihong and Tang, Yuzhi and Ma, Wentao and Hu, Shengchao and Hu, Shell Xu and Iacob, Alex and Mehrotra, Abhinav and Lane, Nicholas D , booktitle =. Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods , year =

work page
[6]

Chen and Suchin Gururangan and Mitchell Wortsman and Alon Albalak and Yonatan Bitton and Marianna Nezhurina and Amro Abbas and Cheng

Jeffrey Li and Alex Fang and Georgios Smyrnis and Maor Ivgi and Matt Jordan and Samir Yitzhak Gadre and Hritik Bansal and Etash Guha and Sedrick Scott Keh and Kushal Arora and Saurabh Garg and Rui Xin and Niklas Muennighoff and Reinhard Heckel and Jean Mercat and Mayee F. Chen and Suchin Gururangan and Mitchell Wortsman and Alon Albalak and Yonatan Bitton...

work page 2024
[7]

Understanding Catastrophic Forgetting in Language Models via Implicit Inference , url =

Suhas Kotha and Jacob Mitchell Springer and Aditi Raghunathan , booktitle =. Understanding Catastrophic Forgetting in Language Models via Implicit Inference , url =

work page
[8]

Yang and Bin Wu and Laurence Aitchison and Emine Yilmaz and Aldo Lipani , booktitle =

Zhengxiang Shi and Adam X. Yang and Bin Wu and Laurence Aitchison and Emine Yilmaz and Aldo Lipani , booktitle =. Instruction Tuning With Loss Over Instructions , url =

work page
[9]

Cross-Stitch Networks for Multi-task Learning , url =

Ishan Misra and Abhinav Shrivastava and Abhinav Gupta and Martial Hebert , booktitle =. Cross-Stitch Networks for Multi-task Learning , url =

work page
[10]

Latent Multi-Task Architecture Learning , url =

Sebastian Ruder and Joachim Bingel and Isabelle Augenstein and Anders S. Latent Multi-Task Architecture Learning , url =. The Thirty-Third

work page
[11]

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection , url =

Bethune, Louis and Grangier, David and Busbridge, Dan and Gualdoni, Eleonora and Cuturi, Marco and Ablin, Pierre , journal =. Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection , url =

work page
[12]

Gradient Surgery for Multi-Task Learning , url =

Tianhe Yu and Saurabh Kumar and Abhishek Gupta and Sergey Levine and Karol Hausman and Chelsea Finn , booktitle =. Gradient Surgery for Multi-Task Learning , url =

work page
[13]

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , url =

Alex Kendall and Yarin Gal and Roberto Cipolla , booktitle =. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , url =

work page
[14]

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , url =

Zhao Chen and Vijay Badrinarayanan and Chen. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , url =. Proceedings of the 35th International Conference on Machine Learning,

work page
[15]

Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models , url =

Tianxing He and Jun Liu and Kyunghyun Cho and Myle Ott and Bing Liu and James Glass and Fuchun Peng , journal =. Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models , url =

work page
[16]

Balancing Training for Multilingual Neural Machine Translation , url =

Wang, Xinyi and Tsvetkov, Yulia and Neubig, Graham , booktitle =. Balancing Training for Multilingual Neural Machine Translation , url =

work page
[17]

Multi-Task Transfer Matters During Instruction-Tuning , url =

Mueller, David and Dredze, Mark and Andrews, Nicholas , booktitle =. Multi-Task Transfer Matters During Instruction-Tuning , url =

work page
[18]

Boosting Multi-Domain Fine-Tuning of Large Language Models through Evolving Interactions between Samples , url =

Liang, Xize and Yang, Lin and Wang, Jie and Lu, Yiyang and Wu, Runyu and Chen, Hanzhu and Hao, Jianye , booktitle =. Boosting Multi-Domain Fine-Tuning of Large Language Models through Evolving Interactions between Samples , url =

work page
[19]

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition , url =

Guanting Dong and Hongyi Yuan and Keming Lu and Chengpeng Li and Mingfeng Xue and Dayiheng Liu and Wei Wang and Zheng Yuan and Chang Zhou and Jingren Zhou , journal =. How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition , url =

work page
[20]

Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models , url =

Minghao Wu and Thuy-Trang Vu and Lizhen Qu and Gholamreza Haffari , journal =. Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models , url =

work page
[21]

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics , url =

Vinay Venkatesh Ramasesh and Ethan Dyer and Maithra Raghu , booktitle =. Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics , url =

work page
[22]

Bell and Neil D

Samuel J. Bell and Neil D. Lawrence , journal =. The Effect of Task Ordering in Continual Learning , url =

work page
[23]

Le and Tengyu Ma and Adams Wei Yu , booktitle =

Sang Michael Xie and Hieu Pham and Xuanyi Dong and Nan Du and Hanxiao Liu and Yifeng Lu and Percy Liang and Quoc V. Le and Tengyu Ma and Adams Wei Yu , booktitle =. DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining , url =

work page
[24]

Simin Fan and Matteo Pagliardini and Martin Jaggi , booktitle =

work page
[25]

Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning , url =

Wanyun Xie and Francesco Tonin and Volkan Cevher , journal =. Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning , url =

work page
[26]

Kwok and Zhenguo Li and Adrian Weller and Weiyang Liu , booktitle =

Longhui Yu and Weisen Jiang and Han Shi and Jincheng Yu and Zhengying Liu and Yu Zhang and James T. Kwok and Zhenguo Li and Adrian Weller and Weiyang Liu , booktitle =. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models , url =

work page
[27]

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning , url =

Fan, Run-Ze and Wang, Zengzhi and Liu, Pengfei , journal =. MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning , url =

work page
[28]

HuggingFace repository , title =

Wing Lian and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium" , howpublished =. HuggingFace repository , title =

work page
[29]

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , url =

Guilherme Penedo and Hynek Kydl. The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024
[30]

wikimedia/wikipedia , year =

work page
[31]

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining , year =

Fan, Simin and Glarou, Maria Ios and Jaggi, Martin , booktitle =. GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining , year =

work page
[32]

Versatune: An efficient data composition framework for training multi-capability llms , year =

Lu, Keer and Zhao, Keshi and Zhang, Zhuoran and Liang, Zheng and Cui, Bin and Wang, Tengjiao and Zhang, Wentao , booktitle =. Versatune: An efficient data composition framework for training multi-capability llms , year =

work page
[33]

Stella Biderman and Hailey Schoelkopf and Quentin Gregory Anthony and Herbie Bradley and Kyle O'Brien and Eric Hallahan and Mohammad Aflah Khan and Shivanshu Purohit and USVSN Sai Prashanth and Edward Raff and Aviya Skowron and Lintang Sutawika and Oskar van der Wal , booktitle =. Pythia:

work page
[34]

Adam Ibrahim and Benjamin Th. Trans. Mach. Learn. Res. , title =

work page
[35]

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections , url =

Haebin Shin and Lei Ji and Xiao Liu and Zhiwei Yu and Qi Chen and Yeyun Gong , journal =. DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections , url =

work page
[36]

Xing , booktitle =

Yuan Li and Zhengzhong Liu and Eric P. Xing , booktitle =. Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models , url =

work page
[37]

ArXiv preprint , title =

Mustafa Shukor and Louis B. ArXiv preprint , title =

work page
[38]

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models , url =

Sonam Gupta and Yatin Nandwani and Asaf Yehudai and Dinesh Khandelwal and Dinesh Raghu and Sachindra Joshi , journal =. Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models , url =

work page
[39]

Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods , url =

Yifan Hao and Xingyuan Pan and Hanning Zhang and Chenlu Ye and Rui Pan and Tong Zhang , journal =. Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods , url =

work page
[40]

Training Verifiers to Solve Math Word Problems , url =

Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman , journal =. Training Verifiers to Solve Math Word Problems , url =

work page
[41]

Hugging Face repository , title =

Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu , howpublished =. Hugging Face repository , title =

work page
[42]

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement , url =

Tianyu Zheng and Ge Zhang and Tianhao Shen and Xueling Liu and Bill Yuchen Lin and Jie Fu and Wenhu Chen and Xiang Yue , journal =. OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement , url =

work page
[43]

Merrill and Tatsunori Hashimoto and Yejin Choi and Jenia Jitsev and Reinhard Heckel and Maheswaran Sathiamoorthy and Alexandros G

Etash Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John Yang a...

work page
[44]

Aya dataset: An open-access collection for multilingual instruction tuning.arXiv preprint arXiv:2402.06619, 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning , year =. arXiv , author =:2402.06619 , primaryclass =

work page arXiv
[45]

Kornilova, Anastassia and Eidelman, Vladimir , booktitle =

work page
[46]

RepLiQA:

Jo. RepLiQA:. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024
[47]

Parrish, Alicia and Chen, Angelica and Nangia, Nikita and Padmakumar, Vishakh and Phang, Jason and Thompson, Jana and Htut, Phu Mon and Bowman, Samuel , booktitle =

work page
[48]

tinyBenchmarks: evaluating LLMs with fewer examples , url =

Felipe Maia Polo and Lucas Weber and Leshem Choshen and Yuekai Sun and Gongjun Xu and Mikhail Yurochkin , booktitle =. tinyBenchmarks: evaluating LLMs with fewer examples , url =

work page
[49]

Medical Meadow Medical Flashcards , year =

work page
[50]

Glaive Function Calling V2 Dataset , year =

work page
[51]

AutoIF-instruct-61k , year =

work page
[52]

nvidia/Nemotron-SFT-Safety-v1 , year =

work page
[53]

MelioAI/safety-qa-sample , year =

work page
[54]

ArXiv preprint , title =

work page
[55]

Qwen2.5: A Party of Foundation Models , url =

work page
[56]

google/gemma-3-12b-pt , year =

work page
[57]

Qwen/Qwen2.5-3B , year =

work page
[58]

Dynamic Gradient Alignment for Online Data Mixing , url =

Simin Fan and David Grangier and Pierre Ablin , journal =. Dynamic Gradient Alignment for Online Data Mixing , url =

work page
[59]

Qwen/Qwen3-8B-Base , year =

work page
[60]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , url =. The Tenth International Conference on Learning Representations,

work page
[61]

Yu and Jianfeng Gao , journal =

Guiyao Tie and Zeli Zhao and Dingjie Song and Fuyang Wei and Rong Zhou and Yurou Dai and Wen Yin and Zhejian Yang and Jiangyue Yan and Yao Su and Zhenhan Dai and Yifeng Xie and Yihan Cao and Lichao Sun and Pan Zhou and Lifang He and Hechang Chen and Yu Zhang and Qingsong Wen and Tianming Liu and Neil Zhenqiang Gong and Jiliang Tang and Caiming Xiong and H...

work page
[62]

Learning to Reweight Examples for Robust Deep Learning , url =

Mengye Ren and Wenyuan Zeng and Bin Yang and Raquel Urtasun , booktitle =. Learning to Reweight Examples for Robust Deep Learning , url =

work page
[63]

, journal=

Akaike, H. , journal=. A new look at the statistical model identification , year=

work page
[64]

Biometrika , volume=

Regression and time series model selection in small samples , author=. Biometrika , volume=. 1989 , publisher=

work page 1989