SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Pith reviewed 2026-05-23 19:33 UTC · model grok-4.3
The pith
Mixup on examples of varying confidence improves LLM instruction tuning without curated datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SFTMix leverages training dynamics to identify examples with varying confidence levels across the semantic representation space. Confident data is prone to overfitting while unconfident data is harder to generalize, so the method interpolates them to bridge the gap and applies mixup-based regularization on the resulting examples to support learning without relying on well-curated SFT datasets.
What carries the argument
SFTMix, a mixup recipe that classifies examples by confidence from training dynamics then interpolates them for regularization.
If this is right
- Performance gains appear on both instruction-following and healthcare-specific tasks.
- Improvements hold across LLM families and across SFT datasets of varying sizes and qualities.
- The recipe remains compatible with existing data selection techniques.
- It adapts to compute-constrained training scenarios.
- The approach scales to broader applications beyond the tested tasks.
Where Pith is reading between the lines
- The same interpolation step could be tested on other fine-tuning objectives such as preference tuning or continued pretraining.
- If the confidence metric generalizes, it might reduce reliance on external models for data quality assessment in new domains.
- Applying SFTMix only to the lowest-confidence subset could serve as a lightweight variant for resource-limited settings.
- The semantic space unevenness observation suggests potential for confidence-aware sampling in active learning loops.
Load-bearing premise
Examples with different confidence levels should play distinct roles in instruction tuning because confident data overfits and unconfident data generalizes poorly.
What would settle it
If standard SFT on the same datasets and models yields equal or better results than SFTMix across multiple benchmarks, or if confidence scores from training dynamics show no correlation with overfitting or generalization gaps, the central claim would be falsified.
Figures
read the original abstract
To acquire instruction-following capabilities, large language models (LLMs) undergo instruction tuning, where they are trained on instruction-response pairs using next-token prediction (NTP). Efforts to improve instruction tuning often focus on higher-quality supervised fine-tuning (SFT) datasets, typically requiring data filtering with proprietary LLMs or human annotation. In this paper, we take a different approach by proposing SFTMix, a novel Mixup-based recipe that elevates LLM instruction tuning without relying on well-curated datasets. We observe that LLMs exhibit uneven confidence across the semantic representation space. We argue that examples with different confidence levels should play distinct roles in instruction tuning: Confident data is prone to overfitting, while unconfident data is harder to generalize. Based on this insight, SFTMix leverages training dynamics to identify examples with varying confidence levels. We then interpolate them to bridge the confidence gap and apply a Mixup-based regularization to support learning on these additional, interpolated examples. We demonstrate the effectiveness of SFTMix in both instruction-following and healthcare-specific SFT tasks, with consistent improvements across LLM families and SFT datasets of varying sizes and qualities. Extensive analyses across six directions highlight SFTMix's compatibility with data selection, adaptability to compute-constrained scenarios, and scalability to broader applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SFTMix, a Mixup-based recipe for elevating LLM instruction tuning without requiring curated high-quality SFT datasets. It identifies examples with varying confidence levels via training dynamics, interpolates between confident (overfitting-prone) and unconfident (generalization-hard) examples to bridge gaps, and applies Mixup regularization on the interpolated data. The method is evaluated on instruction-following and healthcare-specific tasks, reporting consistent gains across LLM families and datasets of different sizes/qualities, with additional analyses on compatibility, compute constraints, and scalability.
Significance. If the empirical results hold, SFTMix offers a practical, data-agnostic enhancement to standard next-token prediction SFT that avoids reliance on proprietary LLMs or human annotation for data filtering. The six-direction analyses provide evidence of robustness and complementarity with existing data selection methods, which could make effective instruction tuning more accessible in resource-constrained or domain-specific settings.
major comments (2)
- [Abstract, §3] Abstract and §3: The motivating claim that 'confident data is prone to overfitting, while unconfident data is harder to generalize' is presented as an observation from training dynamics, but the manuscript does not report a direct ablation isolating the distinct roles (e.g., training only on confident vs. only on unconfident subsets with and without interpolation) to confirm this drives the gains rather than the Mixup regularization alone.
- [§4] §4 (Experiments): While consistent improvements are asserted across models and datasets, the results lack reported error bars, statistical significance tests, or multiple random seeds for the main tables; this weakens the cross-family and cross-dataset claims given the known variance in LLM fine-tuning.
minor comments (3)
- [§3.1] Notation for confidence scoring via training dynamics (e.g., loss trajectories or logit margins) should be formalized with an equation in §3.1 for reproducibility.
- [Figure 2] Figure 2 or equivalent visualization of interpolated examples would benefit from clearer labeling of the interpolation parameter λ and its sampling distribution.
- [§4.2] The healthcare-specific task description in §4.2 should include the exact dataset size and domain adaptation details to allow direct comparison with general instruction-following results.
Simulated Author's Rebuttal
We thank the referee for the constructive review and recommendation of minor revision. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3: The motivating claim that 'confident data is prone to overfitting, while unconfident data is harder to generalize' is presented as an observation from training dynamics, but the manuscript does not report a direct ablation isolating the distinct roles (e.g., training only on confident vs. only on unconfident subsets with and without interpolation) to confirm this drives the gains rather than the Mixup regularization alone.
Authors: We acknowledge that the manuscript presents the motivation based on training dynamics observations without a dedicated ablation that isolates the contribution of confidence-stratified interpolation from Mixup regularization alone. The six-direction analyses demonstrate overall gains and complementarity, but a targeted ablation would provide stronger causal evidence for the claimed roles of confident and unconfident examples. We will add this ablation in the revised manuscript. revision: yes
-
Referee: [§4] §4 (Experiments): While consistent improvements are asserted across models and datasets, the results lack reported error bars, statistical significance tests, or multiple random seeds for the main tables; this weakens the cross-family and cross-dataset claims given the known variance in LLM fine-tuning.
Authors: We agree that the absence of error bars and multi-seed statistics limits the strength of the cross-model and cross-dataset claims, given the known variance in LLM fine-tuning. Our reported results used single runs owing to compute limits across the evaluated model families and dataset scales. We will rerun key experiments with multiple random seeds, add error bars, and include statistical significance tests in the revised version. revision: yes
Circularity Check
No significant circularity; empirical recipe with independent experimental validation
full rationale
The paper proposes SFTMix as a data-agnostic Mixup recipe motivated by an explicit observation (uneven LLM confidence across semantic space) and an argument about distinct roles for confident vs. unconfident examples. It then describes a procedure using training dynamics for identification, interpolation, and regularization, followed by empirical demonstrations across models, datasets, and tasks. No equations, fitted parameters called predictions, self-definitional steps, or load-bearing self-citations appear in the abstract or described method. The central claims rest on reported performance gains rather than any derivation that reduces to the method's own inputs by construction. This is a standard empirical contribution with self-contained experimental support.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
A Brief Overview: On-Policy Self-Distillation In Large Language Models
OPSD lets a single LLM distill its own reasoning by sampling trajectories from the student role while granting the teacher role privileged access to verified solutions, reducing memory needs versus separate-model dist...
-
A Brief Overview: On-Policy Self-Distillation In Large Language Models
This overview paper explains the conceptual foundations and design principles of On-Policy Self-Distillation for large language models from a beginner's perspective.
Reference graph
Works this paper leans on
-
[1]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint
work page 2023
-
[4]
Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, et al. 2024. A survey on data selection for language models. arXiv preprint
work page 2024
-
[5]
Yoshua Bengio, J \'e r \^o me Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML
work page 2009
-
[6]
Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel
David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In ICLR
work page 2020
-
[7]
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. NeurIPS
work page 2019
-
[8]
Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, and Jeffrey Wu. 2024. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. In ICML
work page 2024
-
[9]
Luigi Carratino, Moustapha Ciss \'e , Rodolphe Jenatton, and Jean-Philippe Vert. 2022. On mixup regularization. JMLR
work page 2022
-
[10]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks
work page 2009
-
[11]
Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In ACL
work page 2020
-
[12]
Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, and Hongxia Jin. 2024. Alpagasus: Training a better alpaca with fewer data. In ICLR
work page 2024
-
[13]
Zeming Chen, Alejandro Hern \'a ndez Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas K \"o pf, Amirkeivan Mohtashami, et al. 2023. Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint
work page 2023
-
[14]
Gonzalez, Ion Stoica, and Eric P
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. 2023. https://lmsys.org/blog/2023-03-30-vicuna/ Vicuna: An open-source chatbot impressing gpt-4 with 90\
work page 2023
-
[15]
Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, and Rong Ge. 2022. Towards understanding the data dependency of mixup-style training. In ICLR
work page 2022
-
[16]
Everlyn Chimoto, Jay Gala, Orevaoghene Ahia, Julia Kreutzer, Bruce Bassett, and Sara Hooker. 2024. Critical learning periods: Leveraging early training dynamics for efficient data pruning. In ACL Findings
work page 2024
-
[17]
Hyeong Kyu Choi, Joonmyung Choi, and Hyunwoo J. Kim. 2022. Tokenmixup: Efficient attention-guided token-level data augmentation for transformers. In NeurIPS
work page 2022
-
[18]
Fenia Christopoulou, Gerasimos Lampouras, and Ignacio Iacobacci. 2022. Training dynamics for curriculum learning: A study on monolingual and cross-lingual nlu. In EMNLP
work page 2022
-
[19]
Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. 2023. Enhancing chat language models by scaling high-quality instructional conversations. In EMNLP
work page 2023
-
[20]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint
work page 2024
-
[21]
Yann Dubois, Percy Liang, and Tatsunori Hashimoto. 2024. Length-controlled alpacaeval: A simple debiasing of automatic evaluators. In COLM
work page 2024
-
[22]
Gamaleldin F Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, and Samy Bengio. 2018. Large margin deep networks for classification. In NeurIPS, pages 850--860
work page 2018
-
[23]
Demi Guo, Yoon Kim, and Alexander M Rush. 2020. Sequence-level mixed sample data augmentation. In EMNLP
work page 2020
-
[24]
Tianyu Han, Lisa C Adams, Jens-Michalis Papaioannou, Paul Grundmann, Tom Oberhauser, Alexander L \"o ser, Daniel Truhn, and Keno K Bressem. 2023. Medalpaca--an open-source collection of medical conversational ai models and training data. arXiv preprint
work page 2023
-
[25]
Zongbo Han, Yifeng Yang, Changqing Zhang, Linjun Zhang, Joey Tianyi Zhou, and Qinghua Hu. 2024. Selective learning: Towards robust calibration with dynamic regularization. arXiv preprint
work page 2024
-
[26]
Muyang He, Shuo Yang, Tiejun Huang, and Bo Zhao. 2024. Large-scale dataset pruning with dynamic uncertainty. In CVPR
work page 2024
-
[27]
Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. Augmix: A simple method to improve robustness and uncertainty under data shift. In ICLR
work page 2020
-
[28]
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. Lo RA : Low-rank adaptation of large language models. In ICLR
work page 2022
-
[29]
Neel Jain, Ping-yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, et al. 2024. Neftune: Noisy embeddings improve instruction finetuning. In ICLR
work page 2024
-
[30]
Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. 2023. Mistral 7b. arXiv preprint
work page 2023
-
[31]
Yiding Jiang, Dilip Krishnan, Hossein Mobahi, and Samy Bengio. 2018. Predicting the generalization gap in deep networks with margin distributions. In ICLR
work page 2018
-
[32]
Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. 2021. What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences
work page 2021
-
[33]
Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. 2019. Pubmedqa: A dataset for biomedical research question answering. In EMNLP
work page 2019
-
[34]
Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, and Nanyun Peng. 2023. Active instruction tuning: Improving cross-task generalization by training on prompt sensitive tasks. In EMNLP
work page 2023
-
[35]
Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. 2024. B io M istral: A collection of open-source pretrained large language models for medical domains. In ACL Findings
work page 2024
-
[36]
Changchun Li, Ximing Li, Lei Feng, and Jihong Ouyang. 2022. Who is your right mixup partner in positive and unlabeled learning. In ICLR
work page 2022
-
[37]
Junnan Li, Richard Socher, and Steven C.H. Hoi. 2020. Dividemix: Learning with noisy labels as semi-supervised learning. In ICLR
work page 2020
-
[38]
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. 2023. Holistic evaluation of language models. TMLR
work page 2023
-
[39]
Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, yelong shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, and Weizhu Chen. 2024. Not all tokens are what you need for pretraining. In NeurIPS
work page 2024
-
[40]
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li. 2023. Trustworthy llms: A survey and guideline for evaluating large language models' alignment. arXiv preprint
work page 2023
-
[41]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint
work page 2019
-
[42]
Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. 2023. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv preprint
work page 2023
-
[43]
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2024. Large language models: A survey. arXiv preprint
work page 2024
-
[44]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. In NeurIPS
work page 2022
-
[45]
Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. 2022. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In CHIL
work page 2022
-
[46]
Chanwoo Park, Sangdoo Yun, and Sanghyuk Chun. 2022. A unified analysis of mixed sample data augmentation: A loss function perspective. In NeurIPS
work page 2022
-
[47]
Seo Yeon Park and Cornelia Caragea. 2022. A data cartography based mixup for pre-trained language models. In NAACL
work page 2022
-
[48]
Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv preprint
work page 2023
-
[49]
Francesco Pinto, Harry Yang, Ser Nam Lim, Philip Torr, and Puneet Dokania. 2022. Using mixup as a regularizer can surprisingly improve accuracy & out-of-distribution robustness. NeurIPS
work page 2022
-
[50]
Eduard Poesina, Cornelia Caragea, and Radu Ionescu. 2024. A novel cartography-based curriculum learning method applied on R o NLI : The first R omanian natural language inference corpus. In ACL
work page 2024
-
[51]
Jun Rao, Xuebo Liu, Lian Lian, Shengjun Cheng, Yunjie Liao, and Min Zhang. 2024. Commonit: Commonality-aware instruction tuning for large language models via data partitions. In EMNLP
work page 2024
-
[52]
Stephanie Schoch, Ritwick Mishra, and Yangfeng Ji. 2023. Data selection for fine-tuning large language models using transferred shapley values. In ACL Workshop
work page 2023
-
[53]
Nabeel Seedat, Nicolas Huynh, Boris van Breugel, and Mihaela van der Schaar. 2024. Curated LLM : Synergy of LLM s and data curation for tabular augmentation in low-data regimes. In ICML
work page 2024
-
[54]
Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, and Aldo Lipani
Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, and Aldo Lipani. 2024. Instruction tuning with loss over instructions. In NeurIPS
work page 2024
-
[55]
Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. Fixmatch: simplifying semi-supervised learning with consistency and confidence. In NeurIPS
work page 2020
-
[56]
Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, S Yu Philip, and Lifang He. 2020. Mixup-transformer: Dynamic data augmentation for nlp tasks. In COLING
work page 2020
-
[57]
Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A Smith, and Yejin Choi. 2020. Dataset cartography: Mapping and diagnosing datasets with training dynamics. In EMNLP
work page 2020
- [58]
-
[59]
Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, L \'e onard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ram \'e , et al. 2024. Gemma 2: Improving open language models at a practical size. arXiv preprint
work page 2024
-
[60]
Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Cl \'e mentine Fourrier, Nathan Habib, et al. 2023. Zephyr: Direct distillation of lm alignment. arXiv preprint
work page 2023
-
[61]
Sirazam Monira, Wheemyung Shin, TaeChoong Chung, and Sung-Ho Bae
A F M Shahab Uddin, Mst. Sirazam Monira, Wheemyung Shin, TaeChoong Chung, and Sung-Ho Bae. 2021. Saliencymix: A saliency guided data augmentation strategy for better regularization. In ICLR
work page 2021
-
[62]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. JMLR
work page 2008
-
[63]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS
work page 2017
-
[64]
Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In ICML
work page 2019
-
[65]
Jiahao Wang, Bolin Zhang, Qianlong Du, Jiajun Zhang, and Dianhui Chu. 2024. A survey on data selection for llm instruction tuning. arXiv preprint
work page 2024
-
[66]
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. Self-instruct: Aligning language models with self-generated instructions. In ACL
work page 2023
-
[67]
Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya Zhang, Weidi Xie, and Yanfeng Wang. 2024. Pmc-llama: toward building open-source language models for medicine. Journal of the American Medical Informatics Association
work page 2024
-
[68]
Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, and Danqi Chen. 2024. LESS : Selecting influential data for targeted instruction tuning. In ICML
work page 2024
-
[69]
Sang Michael Xie, Shibani Santurkar, Tengyu Ma, and Percy Liang. 2023. Data selection for language models via importance resampling. In NeurIPS
work page 2023
-
[70]
Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Qingwei Lin, and Daxin Jiang. 2024. Wizardlm: Empowering large pre-trained language models to follow complex instructions. In ICLR
work page 2024
-
[71]
Huiyun Yang, Huadong Chen, Hao Zhou, and Lei Li. 2022. Enhancing cross-lingual transfer by manifold mixup. In ICLR
work page 2022
-
[72]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In ICLR
work page 2018
-
[73]
Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, and James Zou. 2021. How does mixup help with robustness and generalization? In ICLR
work page 2021
-
[74]
Mike Zhang and Barbara Plank. 2021. Cartography active learning. In EMNLP Findings
work page 2021
-
[75]
Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, et al. 2023. Instruction tuning for large language models: A survey. arXiv preprint
work page 2023
-
[76]
Shujian Zhang, Chengyue Gong, Xingchao Liu, Pengcheng He, Weizhu Chen, and Mingyuan Zhou. 2022. Allsh: Active learning guided by local sensitivity and hardness. In NAACL Findings
work page 2022
-
[77]
Wancong Zhang and Ieshan Vaidya. 2021. Mixup training leads to reduced overfitting and improved calibration for the transformer architecture. arXiv preprint
work page 2021
-
[78]
Hao Zhao, Maksym Andriushchenko, Francesco Croce, and Nicolas Flammarion. 2024. Long is more for alignment: A simple but tough-to-beat baseline for instruction fine-tuning. In ICML
work page 2024
-
[79]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint
work page 2023
-
[80]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena. In NeurIPS
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.