Recognition: 2 theorem links
· Lean TheoremEnhancing Chat Language Models by Scaling High-quality Instructional Conversations
Pith reviewed 2026-05-15 17:19 UTC · model grok-4.3
The pith
Scaling AI-generated multi-turn conversations to 1.5 million dialogues produces a fine-tuned LLaMA that outperforms Vicuna.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UltraChat consists of 1.5 million high-quality multi-turn instructional dialogues generated iteratively by AI to capture the full range of human-AI interactions across topics and instructions; fine-tuning LLaMA on UltraChat produces UltraLLaMA, which outperforms other open-source chat models including the prior leader Vicuna.
What carries the argument
The iterative multi-turn conversation generation framework that produces diverse, coherent dialogues without human queries.
If this is right
- UltraLLaMA delivers higher performance than Vicuna and other open-source chat models on evaluations.
- UltraChat exceeds earlier datasets in scale, average length, diversity, and coherence.
- Open-source chat models can be improved by scaling AI-generated instructional data instead of relying on human queries.
- Public release of both the dataset and the model lets others extend the same generation approach.
Where Pith is reading between the lines
- Removing the need for human queries could lower the cost and raise the speed of building future training sets.
- The same iterative generation method might transfer to non-chat tasks such as code or reasoning data.
- Further increases in dataset size or topic coverage could produce additional capability gains.
Load-bearing premise
Conversations created entirely by AI can still supply enough breadth, coherence, and instructional quality to deliver measurable gains over existing open-source chat models.
What would settle it
If UltraLLaMA shows no improvement or falls behind Vicuna on standard conversational benchmarks and human preference tests, the central claim would not hold.
read the original abstract
Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT. Scaling the diversity and quality of such data, although straightforward, stands a great chance of leading to improved performance. This paper aims to improve the upper bound of open-source models further. We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively. UltraChat contains 1.5 million high-quality multi-turn dialogues and covers a wide range of topics and instructions. Our statistical analysis of UltraChat reveals its superiority in various key metrics, including scale, average length, diversity, coherence, etc., solidifying its position as a leading open-source dataset. Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna, the previously recognized state-of-the-art open-source model. The dataset and the model will be publicly released\footnote{\url{https://github.com/thunlp/UltraChat}}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces UltraChat, a 1.5-million-dialogue synthetic dataset of multi-turn instructional conversations generated entirely by AI models via an iterative framework without any human queries. The authors fine-tune LLaMA on this data to produce UltraLLaMA and claim it consistently outperforms prior open-source chat models including Vicuna on evaluations. The dataset and model are scheduled for public release.
Significance. If the empirical gains are substantiated, the work would show that scaling high-quality synthetic instructional data can raise the performance of open-source chat models, supplying both a large public dataset and a stronger baseline. The reported statistical advantages in scale, length, diversity, and coherence metrics for UltraChat constitute a concrete contribution to data-centric approaches in conversational modeling.
major comments (3)
- [Abstract] Abstract: the claim that UltraLLaMA 'consistently outperforms' Vicuna and other open-source models is unsupported by any quantitative metrics, win rates, benchmark names, statistical tests, or baseline details, rendering the central empirical result unverifiable from the text.
- [Section 3] Section 3 (UltraChat construction): the generation framework is presented only at the level of 'comprehensive framework' and 'iterative' multi-turn synthesis; without explicit prompts, generator models, filtering criteria, or diversity controls, it is impossible to rule out that downstream gains arise from generator-specific stylistic artifacts rather than genuine instructional breadth.
- [Section 5] Section 5 (Experiments): no information is supplied on training hyperparameters, data mixture ratios, evaluation prompts, or whether Vicuna was re-evaluated under identical conditions, so the attribution of improvements specifically to UltraChat data quality cannot be assessed.
minor comments (1)
- [Abstract] The footnote URL for the GitHub release should be checked for completeness and permanence.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that UltraLLaMA 'consistently outperforms' Vicuna and other open-source models is unsupported by any quantitative metrics, win rates, benchmark names, statistical tests, or baseline details, rendering the central empirical result unverifiable from the text.
Authors: We agree that the abstract is too high-level. The full manuscript reports concrete results in Section 5, including win rates on MT-Bench and other benchmarks. We will revise the abstract to include key quantitative metrics (e.g., average win rate against Vicuna and specific benchmark scores) so the central claim is directly verifiable. revision: yes
-
Referee: [Section 3] Section 3 (UltraChat construction): the generation framework is presented only at the level of 'comprehensive framework' and 'iterative' multi-turn synthesis; without explicit prompts, generator models, filtering criteria, or diversity controls, it is impossible to rule out that downstream gains arise from generator-specific stylistic artifacts rather than genuine instructional breadth.
Authors: We acknowledge that additional implementation details are required for reproducibility. We will expand Section 3 to explicitly list the generator models (GPT-3.5-turbo and GPT-4), the full prompts used at each iteration stage, the quality filtering criteria, and the diversity sampling strategy employed to ensure broad topic coverage. revision: yes
-
Referee: [Section 5] Section 5 (Experiments): no information is supplied on training hyperparameters, data mixture ratios, evaluation prompts, or whether Vicuna was re-evaluated under identical conditions, so the attribution of improvements specifically to UltraChat data quality cannot be assessed.
Authors: We agree that these details are essential. The revised Section 5 will report the exact training hyperparameters, data mixture ratios, evaluation prompts, and confirm that all baselines including Vicuna were re-evaluated under identical conditions and prompts to support attribution to UltraChat. revision: yes
Circularity Check
No circularity: empirical gains measured on external benchmarks
full rationale
The paper constructs UltraChat via an iterative synthetic generation framework and fine-tunes LLaMA to obtain UltraLLaMA, then reports performance via direct comparison against external open-source models (Vicuna and others) on standard benchmarks. No equations, fitted parameters, or self-citations are invoked as load-bearing premises; the claimed superiority is presented strictly as an empirical outcome of the data scale and quality, independent of any internal redefinition or tautological reduction. The derivation chain therefore remains self-contained against external evaluation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Instruction tuning on diverse multi-turn conversations improves chat model capabilities
Lean theorems connected to this paper
-
Foundation/LawOfExistencelaw_of_existence unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively.
-
Foundation/LogicAsFunctionalEquationRCL_is_unique_functional_form_of_logic unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs
Semantic consensus on model outputs for public prompts enables federated LLM fine-tuning that matches parameter-aggregation baselines with orders-of-magnitude lower communication.
-
IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning
IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of...
-
Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory
Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motiv...
-
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
-
Self-Rewarding Language Models
Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO sparse MoE matches dense Transformer performance at 20% expert activation with a 3x hardware inference speedup.
-
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
-
NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference
NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.
-
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
-
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable pe...
-
Do Linear Probes Generalize Better in Persona Coordinates?
Probes on persona principal components from contrastive prompts generalize better than raw activation probes for harmful behaviors across 10 datasets.
-
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.
-
Acceptance Dynamics Across Cognitive Domains in Speculative Decoding
Empirical measurements across four NLP domains show task type is a stronger predictor of speculative decoding acceptance than tree depth, with chat uniquely achieving expected accepted length over 1 token per step.
-
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
-
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.
-
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
-
Yi: Open Foundation Models by 01.AI
Yi models are 6B and 34B open foundation models pretrained on 3.1T curated tokens that achieve strong benchmark results through data quality and targeted extensions like long context and vision alignment.
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
Reference graph
Works this paper leans on
- [1]
- [3]
-
[4]
Stanford Center for Research on Foundation Models
Alpaca: A Strong, Replicable Instruction-Following Model , author=. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html , year=
work page 2023
-
[6]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[7]
Compacter: Efficient low-rank hypercomplex adapter layers , url =
Mahabadi, Rabeeh Karimi and Henderson, James and Ruder, Sebastian , journal =. Compacter: Efficient low-rank hypercomplex adapter layers , url =
-
[8]
All nlp tasks are generation tasks: A general pretraining framework , url =
Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie , journal =. All nlp tasks are generation tasks: A general pretraining framework , url =
-
[9]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Film: Visual reasoning with a general conditioning layer , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[10]
Advances in neural information processing systems , volume=
Learning multiple visual domains with residual adapters , author=. Advances in neural information processing systems , volume=
-
[11]
Convolutional Neural Networks for Sentence Classification
Kim, Yoon. Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014. doi:10.3115/v1/D14-1181
-
[12]
A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification , author=. arXiv preprint arXiv:1510.03820 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [13]
-
[14]
Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=
work page 1997
-
[16]
Advances in Neural Information Processing Systems , volume=
A neural probabilistic language model , author=. Advances in Neural Information Processing Systems , volume=. 2000 , url=
work page 2000
-
[17]
Paradigm shift in natural language processing , url =
Sun, Tianxiang and Liu, Xiangyang and Qiu, Xipeng and Huang, Xuanjing , journal =. Paradigm shift in natural language processing , url =
-
[18]
Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham , journal =. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , url =
-
[19]
Single-dataset experts for multi-dataset question answering , url =
Friedman, Dan and Dodge, Ben and Chen, Danqi , journal =. Single-dataset experts for multi-dataset question answering , url =
-
[20]
Pfeiffer, Jonas and Vuli. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =
work page 2020
-
[21]
Chalkidis, Ilias and Fergadiotis, Manos and Androutsopoulos, Ion , journal =. MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer , url =
- [22]
-
[23]
Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , title =
-
[24]
Language models are unsupervised multitask learners , volume =
Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others , journal =. Language models are unsupervised multitask learners , volume =
-
[26]
LIMA: Less Is More for Alignment , author=
-
[27]
Nature Machine Intelligence , pages=
Parameter-efficient fine-tuning of large-scale pre-trained language models , author=. Nature Machine Intelligence , pages=. 2023 , publisher=
work page 2023
-
[29]
Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs , author=
-
[32]
Hello Dolly: Democratizing the magic of ChatGPT with open models , author=. GitHub repository , url=. 2023 , publisher =
work page 2023
-
[33]
Exploring the limits of transfer learning with a unified text-to-text transformer , url =
Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J , journal =. Exploring the limits of transfer learning with a unified text-to-text transformer , url =
-
[34]
Gomez and Lukasz Kaiser and Illia Polosukhin , bibsource =
Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , bibsource =. Attention is All you Need , url =. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,
work page 2017
-
[35]
BERT: Pre-training of deep bidirectional transformers for language understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle =. doi:10.18653/v1/N19-1423 , pages =
-
[36]
Pre-Trained Models: Past, Present and Future , url =
Xu Han and Zhengyan Zhang and Ning Ding and Yuxian Gu and Xiao Liu and Yuqi Huo and Jiezhong Qiu and Liang Zhang and Wentao Han and Minlie Huang and Qin Jin and Yanyan Lan and Yang Liu and Zhiyuan Liu and Zhiwu Lu and Xipeng Qiu and Ruihua Song and Jie Tang and Ji-Rong Wen and Jinhui Yuan and Wayne Xin Zhao and Jun Zhu , doi =. Pre-Trained Models: Past, P...
-
[37]
Lora: Low-rank adaptation of large language models , url =
Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , journal =. Lora: Low-rank adaptation of large language models , url =
-
[38]
Natural Language Understanding with the Quora Question Pairs Dataset , author=. 2019 , journal=
work page 2019
-
[39]
Dolan, William B. and Brockett, Chris. Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of IWP Workshop. 2005
work page 2005
-
[40]
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems
Ling, Wang and Yogatama, Dani and Dyer, Chris and Blunsom, Phil. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1015
-
[41]
A SICK cure for the evaluation of compositional distributional semantic models
Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto. A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014
work page 2014
-
[42]
Tushar Khot and Ashish Sabharwal and Peter Clark , Booktitle =. 2018 , url =
work page 2018
-
[43]
ROUGE : A Package for Automatic Evaluation of Summaries
Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004
work page 2004
-
[44]
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , url =
He, Ruidan and Liu, Linlin and Ye, Hai and Tan, Qingyu and Ding, Bosheng and Cheng, Liying and Low, Jiawei and Bing, Lidong and Si, Luo , booktitle =. On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , url =. doi:10.18653/v1/2021.acl-long.172 , pages =
-
[45]
Proceedings of the 36th International Conference on Machine Learning,
Asa Cooper Stickland and Iain Murray , bibsource =. Proceedings of the 36th International Conference on Machine Learning,
-
[46]
Parameter-Efficient Transfer Learning for
Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , bibsource =. Parameter-Efficient Transfer Learning for. Proceedings of the 36th International Conference on Machine Learning,
-
[47]
Ppt: Pre-trained prompt tuning for few-shot learning , url =
Gu, Yuxian and Han, Xu and Liu, Zhiyuan and Huang, Minlie , journal =. Ppt: Pre-trained prompt tuning for few-shot learning , url =
-
[48]
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , url =
Yuning Mao and Lambert Mathias and Rui Hou and Amjad Almahairi and Hao Ma and Jiawei Han and Wen-tau Yih and Madian Khabsa , journal =. UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , url =
-
[49]
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , url =
Aghajanyan, Armen and Gupta, Sonal and Zettlemoyer, Luke , booktitle =. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , url =. doi:10.18653/v1/2021.acl-long.568 , pages =
-
[50]
Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning , url =
Qin, Yujia and Wang, Xiaozhi and Su, Yusheng and Lin, Yankai and Ding, Ning and Liu, Zhiyuan and Li, Juanzi and Hou, Lei and Li, Peng and Sun, Maosong and others , journal =. Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning , url =
-
[51]
Prefix-Tuning: Optimizing Continuous Prompts for Generation , url =
Li, Xiang Lisa and Liang, Percy , booktitle =. Prefix-Tuning: Optimizing Continuous Prompts for Generation , url =. doi:10.18653/v1/2021.acl-long.353 , pages =
-
[52]
The power of scale for parameter-efficient prompt tuning , url =
Lester, Brian and Al-Rfou, Rami and Constant, Noah , journal =. The power of scale for parameter-efficient prompt tuning , url =
-
[53]
What would elsa do? freezing layers during transformer fine-tuning , url =
Lee, Jaejun and Tang, Raphael and Lin, Jimmy , journal =. What would elsa do? freezing layers during transformer fine-tuning , url =
-
[54]
Pfeiffer, Jonas and R. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , doi =
work page 2020
-
[55]
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , url =
Zhao, Mengjie and Lin, Tao and Mi, Fei and Jaggi, Martin and Sch. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , url =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =
work page 2020
-
[56]
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , url =
Tu Vu and Brian Lester and Noah Constant and Rami Al-Rfou and Daniel Cer , journal =. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , url =
-
[57]
On Transferability of Prompt Tuning for Natural Language Understanding , url =
Yusheng Su and Xiaozhi Wang and Yujia Qin and Chi-Min Chan and Yankai Lin and Zhiyuan Liu and Peng Li and Juanzi Li and Lei Hou and Maosong Sun and Jie Zhou , journal =. On Transferability of Prompt Tuning for Natural Language Understanding , url =
-
[58]
Robust Transfer Learning with Pretrained Language Models through Adapters , url =
Han, Wenjuan and Pang, Bo and Wu, Ying Nian , booktitle =. Robust Transfer Learning with Pretrained Language Models through Adapters , url =. doi:10.18653/v1/2021.acl-short.108 , pages =
-
[59]
Pfeiffer, Jonas and Kamath, Aishwarya and R. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =
-
[60]
Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , url =
Zaken, Elad Ben and Ravfogel, Shauli and Goldberg, Yoav , journal =. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , url =
-
[61]
OpenPrompt: An Open-source Framework for Prompt-learning , url =
Ding, Ning and Hu, Shengding and Zhao, Weilin and Chen, Yulin and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong , journal =. OpenPrompt: An Open-source Framework for Prompt-learning , url =
-
[62]
Layer Normalization , year =. arXiv , author =:1607.06450 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv
-
[63]
Deep Residual Learning for Image Recognition , url =
Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , bibsource =. Deep Residual Learning for Image Recognition , url =. 2016. doi:10.1109/CVPR.2016.90 , pages =
-
[64]
Exploring the Limits of Transfer Learning with a Unified Text-to-Tex , url =
Colin Raffel an , journal =. Exploring the Limits of Transfer Learning with a Unified Text-to-Tex , url =
-
[65]
doi:10.18653/v1/2020.acl-main.703 , pages =
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , booktitle =. doi:10.18653/v1/2020.acl-main.703 , pages =
-
[66]
RoBERTa: A Robustly Optimized BERT Pretraining Approach , url =
Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , journal =. RoBERTa: A Robustly Optimized BERT Pretraining Approach , url =
-
[67]
Kingma and Jimmy Ba , bibsource =
Diederik P. Kingma and Jimmy Ba , bibsource =. Adam:. 3rd International Conference on Learning Representations,
-
[68]
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , url =
Noam Shazeer and Mitchell Stern , bibsource =. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , url =. Proceedings of the 35th International Conference on Machine Learning,
-
[69]
Datasets: A Community Library for Natural Language Processing , url =
Lhoest, Quentin and Villanova del Moral, Albert and Jernite, Yacine and Thakur, Abhishek and von Platen, Patrick and Patil, Suraj and Chaumond, Julien and Drame, Mariama and Plu, Julien and Tunstall, Lewis and Davison, Joe and. Datasets: A Community Library for Natural Language Processing , url =. Proceedings of the 2021 Conference on Empirical Methods in...
work page 2021
-
[70]
Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , bibsource =. 7th International Conference on Learning Representations,
-
[71]
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning , author=. Neural Networks , volume=. 2018 , publisher=
work page 2018
-
[72]
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , url =
Schick, Timo and Sch. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , url =. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =
-
[73]
Zonghan Yang and Yang Liu , booktitle =
-
[74]
doi:10.1016/S0076-5392(08)62095-0 , pages =
Kopp, Richard E , booktitle =. doi:10.1016/S0076-5392(08)62095-0 , pages =
-
[75]
Qianxiao Li and Long Chen and Cheng Tai and Weinan E , bibsource =. J. Mach. Learn. Res. , pages =
-
[76]
Towards Robust Neural Networks via Close-loop Control , url =
Zhuotong Chen and Qianxiao Li and Zheng Zhang , bibsource =. Towards Robust Neural Networks via Close-loop Control , url =. 9th International Conference on Learning Representations,
-
[77]
arXiv preprint arXiv:2202.09817 , year=
Y-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning , author=. arXiv preprint arXiv:2202.09817 , year=
-
[78]
Parameter-Efficient Transfer Learning with Diff Pruning , url =
Guo, Demi and Rush, Alexander and Kim, Yoon , booktitle =. Parameter-Efficient Transfer Learning with Diff Pruning , url =. doi:10.18653/v1/2021.acl-long.378 , pages =
-
[79]
Leshno, Moshe and Lin, Vladimir Ya and Pinkus, Allan and Schocken, Shimon , journal =. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function , volume =
-
[80]
In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Shaokai Ye and Kailu Wu and Mu Zhou and Yunfei Yang and Sia Huat Tan and Kaidi Xu and Jiebo Song and Chenglong Bao and Kaisheng Ma , bibsource =. Light-weight Calibrator:. 2020. doi:10.1109/CVPR42600.2020.01375 , pages =
-
[81]
International Conference on Learning Representations , year=
Towards a Unified View of Parameter-Efficient Transfer Learning , author=. International Conference on Learning Representations , year=
-
[82]
Wright, John and Ma, Yi , publisher =. High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications , year =
-
[83]
Huang, Lifu and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin , booktitle =. Cosmos. doi:10.18653/v1/D19-1243 , pages =
-
[84]
doi:10.18653/v1/D19-1608 , pages =
Tafjord, Oyvind and Gardner, Matt and Lin, Kevin and Clark, Peter , booktitle =. doi:10.18653/v1/D19-1608 , pages =
-
[85]
Sap, Maarten and Rashkin, Hannah and Chen, Derek and Le Bras, Ronan and Choi, Yejin , booktitle =. Social. doi:10.18653/v1/D19-1454 , pages =
- [86]
-
[87]
doi:10.18653/v1/2020.bionlp-1.15 , pages =
Pappas, Dimitris and Stavropoulos, Petros and Androutsopoulos, Ion and McDonald, Ryan , booktitle =. doi:10.18653/v1/2020.bionlp-1.15 , pages =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.