arxiv: 2305.14233 · v1 · submitted 2023-05-23 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Ning Ding , Yulin Chen , Bokai Xu , Yujia Qin , Zhi Zheng , Shengding Hu , Zhiyuan Liu , Maosong Sun

show 1 more author

Bowen Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:19 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords instructional conversationschat language modelsUltraChatLLaMA fine-tuningmulti-turn dialoguesopen-source chat modelsVicuna comparison

0 comments

The pith

Scaling AI-generated multi-turn conversations to 1.5 million dialogues produces a fine-tuned LLaMA that outperforms Vicuna.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that high-quality instructional data can be created entirely by AI without any human queries. It introduces UltraChat, a 1.5-million-dialogue dataset built through iterative generation to cover broad topics and multi-turn interactions. Fine-tuning LLaMA on this data yields UltraLLaMA, which beats prior open-source chat models including Vicuna on evaluations. A reader would care because the result points to data scale and coherence as direct levers for lifting open-source chat performance rather than depending on human-collected queries. Statistical checks confirm the new dataset leads prior ones in size, length, diversity, and coherence.

Core claim

UltraChat consists of 1.5 million high-quality multi-turn instructional dialogues generated iteratively by AI to capture the full range of human-AI interactions across topics and instructions; fine-tuning LLaMA on UltraChat produces UltraLLaMA, which outperforms other open-source chat models including the prior leader Vicuna.

What carries the argument

The iterative multi-turn conversation generation framework that produces diverse, coherent dialogues without human queries.

If this is right

UltraLLaMA delivers higher performance than Vicuna and other open-source chat models on evaluations.
UltraChat exceeds earlier datasets in scale, average length, diversity, and coherence.
Open-source chat models can be improved by scaling AI-generated instructional data instead of relying on human queries.
Public release of both the dataset and the model lets others extend the same generation approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Removing the need for human queries could lower the cost and raise the speed of building future training sets.
The same iterative generation method might transfer to non-chat tasks such as code or reasoning data.
Further increases in dataset size or topic coverage could produce additional capability gains.

Load-bearing premise

Conversations created entirely by AI can still supply enough breadth, coherence, and instructional quality to deliver measurable gains over existing open-source chat models.

What would settle it

If UltraLLaMA shows no improvement or falls behind Vicuna on standard conversational benchmarks and human preference tests, the central claim would not hold.

read the original abstract

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT. Scaling the diversity and quality of such data, although straightforward, stands a great chance of leading to improved performance. This paper aims to improve the upper bound of open-source models further. We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively. UltraChat contains 1.5 million high-quality multi-turn dialogues and covers a wide range of topics and instructions. Our statistical analysis of UltraChat reveals its superiority in various key metrics, including scale, average length, diversity, coherence, etc., solidifying its position as a leading open-source dataset. Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna, the previously recognized state-of-the-art open-source model. The dataset and the model will be publicly released\footnote{\url{https://github.com/thunlp/UltraChat}}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ships a 1.5M synthetic multi-turn dataset and a LLaMA fine-tune that beats Vicuna on their tests, but the abstract withholds the actual numbers and generation details needed to judge the gains.

read the letter

The core thing to take away is that they built UltraChat, a 1.5 million multi-turn instructional dialogue set generated without any human queries, then fine-tuned LLaMA on it to produce UltraLLaMA, which they say outperforms Vicuna and other open-source models. That is the concrete advance they are offering. They also release both the data and the model, which is the part that will actually get used. The dataset construction uses an iterative framework aimed at breadth and coherence across topics, and they back it with some statistical checks on length, diversity, and coherence that look reasonable on the surface. That is useful work for anyone who wants larger synthetic chat data to experiment with. The soft spots sit in the evaluation. The abstract claims consistent superiority but supplies no scores, no list of exact baselines, no statistical tests, and no description of how the test sets were chosen or whether any data was held out. Without those, it is hard to tell whether the reported edge comes from the data itself or from training choices that are not spelled out. The concern about generator artifacts is also live: if the dialogues mostly reproduce the style and coverage biases of the model used to create them, the downstream gains could be narrower than claimed. This is a practical paper for people working on open-source chat models and instruction tuning. It gives them a new data resource and one more data point on scaling synthetic conversations, even if the method is an extension of existing techniques rather than a conceptual shift. The thinking is clear and the release makes the claims testable, so the work deserves a serious referee who can check the numbers and the generation prompts in the full version.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces UltraChat, a 1.5-million-dialogue synthetic dataset of multi-turn instructional conversations generated entirely by AI models via an iterative framework without any human queries. The authors fine-tune LLaMA on this data to produce UltraLLaMA and claim it consistently outperforms prior open-source chat models including Vicuna on evaluations. The dataset and model are scheduled for public release.

Significance. If the empirical gains are substantiated, the work would show that scaling high-quality synthetic instructional data can raise the performance of open-source chat models, supplying both a large public dataset and a stronger baseline. The reported statistical advantages in scale, length, diversity, and coherence metrics for UltraChat constitute a concrete contribution to data-centric approaches in conversational modeling.

major comments (3)

[Abstract] Abstract: the claim that UltraLLaMA 'consistently outperforms' Vicuna and other open-source models is unsupported by any quantitative metrics, win rates, benchmark names, statistical tests, or baseline details, rendering the central empirical result unverifiable from the text.
[Section 3] Section 3 (UltraChat construction): the generation framework is presented only at the level of 'comprehensive framework' and 'iterative' multi-turn synthesis; without explicit prompts, generator models, filtering criteria, or diversity controls, it is impossible to rule out that downstream gains arise from generator-specific stylistic artifacts rather than genuine instructional breadth.
[Section 5] Section 5 (Experiments): no information is supplied on training hyperparameters, data mixture ratios, evaluation prompts, or whether Vicuna was re-evaluated under identical conditions, so the attribution of improvements specifically to UltraChat data quality cannot be assessed.

minor comments (1)

[Abstract] The footnote URL for the GitHub release should be checked for completeness and permanence.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that UltraLLaMA 'consistently outperforms' Vicuna and other open-source models is unsupported by any quantitative metrics, win rates, benchmark names, statistical tests, or baseline details, rendering the central empirical result unverifiable from the text.

Authors: We agree that the abstract is too high-level. The full manuscript reports concrete results in Section 5, including win rates on MT-Bench and other benchmarks. We will revise the abstract to include key quantitative metrics (e.g., average win rate against Vicuna and specific benchmark scores) so the central claim is directly verifiable. revision: yes
Referee: [Section 3] Section 3 (UltraChat construction): the generation framework is presented only at the level of 'comprehensive framework' and 'iterative' multi-turn synthesis; without explicit prompts, generator models, filtering criteria, or diversity controls, it is impossible to rule out that downstream gains arise from generator-specific stylistic artifacts rather than genuine instructional breadth.

Authors: We acknowledge that additional implementation details are required for reproducibility. We will expand Section 3 to explicitly list the generator models (GPT-3.5-turbo and GPT-4), the full prompts used at each iteration stage, the quality filtering criteria, and the diversity sampling strategy employed to ensure broad topic coverage. revision: yes
Referee: [Section 5] Section 5 (Experiments): no information is supplied on training hyperparameters, data mixture ratios, evaluation prompts, or whether Vicuna was re-evaluated under identical conditions, so the attribution of improvements specifically to UltraChat data quality cannot be assessed.

Authors: We agree that these details are essential. The revised Section 5 will report the exact training hyperparameters, data mixture ratios, evaluation prompts, and confirm that all baselines including Vicuna were re-evaluated under identical conditions and prompts to support attribution to UltraChat. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains measured on external benchmarks

full rationale

The paper constructs UltraChat via an iterative synthetic generation framework and fine-tunes LLaMA to obtain UltraLLaMA, then reports performance via direct comparison against external open-source models (Vicuna and others) on standard benchmarks. No equations, fitted parameters, or self-citations are invoked as load-bearing premises; the claimed superiority is presented strictly as an empirical outcome of the data scale and quality, independent of any internal redefinition or tautological reduction. The derivation chain therefore remains self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that high-quality multi-turn instructional data improves chat performance; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption Instruction tuning on diverse multi-turn conversations improves chat model capabilities
Standard premise in the LLM fine-tuning literature invoked to justify the dataset and training step.

pith-pipeline@v0.9.0 · 5548 in / 1230 out tokens · 41993 ms · 2026-05-15T17:19:38.619498+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation/LawOfExistence law_of_existence unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively.
Foundation/LogicAsFunctionalEquation RCL_is_unique_functional_form_of_logic unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs
cs.LG 2026-05 unverdicted novelty 7.0

Semantic consensus on model outputs for public prompts enables federated LLM fine-tuning that matches parameter-aggregation baselines with orders-of-magnitude lower communication.
IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning
cs.LG 2026-04 unverdicted novelty 7.0

IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of...
Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory
cs.LG 2026-04 unverdicted novelty 7.0

Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motiv...
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
cs.CL 2024-10 unverdicted novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
Self-Rewarding Language Models
cs.CL 2024-01 conditional novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
cs.LG 2026-05 conditional novelty 6.0

DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
cs.LG 2026-05 unverdicted novelty 6.0

DECO sparse MoE matches dense Transformer performance at 20% expert activation with a 3x hardware inference speedup.
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
cs.CR 2026-04 unverdicted novelty 6.0

TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference
cs.AR 2026-04 unverdicted novelty 6.0

NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
cs.CV 2024-12 unverdicted novelty 6.0

InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
SnapKV: LLM Knows What You are Looking for Before Generation
cs.CL 2024-04 conditional novelty 6.0

SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable pe...
Do Linear Probes Generalize Better in Persona Coordinates?
cs.AI 2026-05 unverdicted novelty 5.0

Probes on persona principal components from contrastive prompts generalize better than raw activation probes for harmful behaviors across 10 datasets.
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
cs.LG 2026-04 unverdicted novelty 5.0

ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.
Acceptance Dynamics Across Cognitive Domains in Speculative Decoding
cs.AI 2026-04 unverdicted novelty 5.0

Empirical measurements across four NLP domains show task type is a stronger predictor of speculative decoding acceptance than tree depth, with chat uniquely achieving expected accepted length over 1 token per step.
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
cs.CL 2026-04 unverdicted novelty 5.0

Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
cs.CL 2025-02 unverdicted novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
cs.CV 2024-08 conditional novelty 5.0

MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
cs.CL 2026-04 unverdicted novelty 4.0

Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
Yi: Open Foundation Models by 01.AI
cs.CL 2024-03 unverdicted novelty 4.0

Yi models are 6B and 34B open foundation models pretrained on 3.1T curated tokens that achieve strong benchmark results through data quality and targeted extensions like long context and vision alignment.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
cs.AI 2025-01 unverdicted novelty 3.0

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
cs.CL 2024-12 accept novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Reference graph

Works this paper leans on

253 extracted references · 253 canonical work pages · cited by 20 Pith papers · 15 internal anchors

[1]

OpenAI , year=

Chatgpt: Optimizing language models for dialogue , author=. OpenAI , year=

work page
[3]

arXiv , year=

GPT-4 technical report , author=. arXiv , year=

work page
[4]

Stanford Center for Research on Foundation Models

Alpaca: A Strong, Replicable Instruction-Following Model , author=. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html , year=

work page 2023
[6]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[7]

Compacter: Efficient low-rank hypercomplex adapter layers , url =

Mahabadi, Rabeeh Karimi and Henderson, James and Ruder, Sebastian , journal =. Compacter: Efficient low-rank hypercomplex adapter layers , url =

work page
[8]

All nlp tasks are generation tasks: A general pretraining framework , url =

Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie , journal =. All nlp tasks are generation tasks: A general pretraining framework , url =

work page
[9]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Film: Visual reasoning with a general conditioning layer , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[10]

Advances in neural information processing systems , volume=

Learning multiple visual domains with residual adapters , author=. Advances in neural information processing systems , volume=

work page
[11]

Convolutional Neural Networks for Sentence Classification

Kim, Yoon. Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014. doi:10.3115/v1/D14-1181

work page doi:10.3115/v1/d14-1181 2014
[12]

A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification

A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification , author=. arXiv preprint arXiv:1510.03820 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

ACL , year=

A convolutional neural network for modelling sentences , author=. ACL , year=

work page
[14]

Neural computation , volume=

Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=

work page 1997
[16]

Advances in Neural Information Processing Systems , volume=

A neural probabilistic language model , author=. Advances in Neural Information Processing Systems , volume=. 2000 , url=

work page 2000
[17]

Paradigm shift in natural language processing , url =

Sun, Tianxiang and Liu, Xiangyang and Qiu, Xipeng and Huang, Xuanjing , journal =. Paradigm shift in natural language processing , url =

work page
[18]

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , url =

Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham , journal =. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , url =

work page
[19]

Single-dataset experts for multi-dataset question answering , url =

Friedman, Dan and Dodge, Ben and Chen, Danqi , journal =. Single-dataset experts for multi-dataset question answering , url =

work page
[20]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =

Pfeiffer, Jonas and Vuli. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =

work page 2020
[21]

MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer , url =

Chalkidis, Ilias and Fergadiotis, Manos and Androutsopoulos, Ion , journal =. MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer , url =

work page
[22]

Proceedings of EMNLP , pages =

R. Proceedings of EMNLP , pages =

work page
[23]

Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , title =

work page
[24]

Language models are unsupervised multitask learners , volume =

Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others , journal =. Language models are unsupervised multitask learners , volume =

work page
[26]

LIMA: Less Is More for Alignment , author=

work page
[27]

Nature Machine Intelligence , pages=

Parameter-efficient fine-tuning of large-scale pre-trained language models , author=. Nature Machine Intelligence , pages=. 2023 , publisher=

work page 2023
[29]

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs , author=

work page
[32]

GitHub repository , url=

Hello Dolly: Democratizing the magic of ChatGPT with open models , author=. GitHub repository , url=. 2023 , publisher =

work page 2023
[33]

Exploring the limits of transfer learning with a unified text-to-text transformer , url =

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J , journal =. Exploring the limits of transfer learning with a unified text-to-text transformer , url =

work page
[34]

Gomez and Lukasz Kaiser and Illia Polosukhin , bibsource =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , bibsource =. Attention is All you Need , url =. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,

work page 2017
[35]

BERT: Pre-training of deep bidirectional transformers for language understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle =. doi:10.18653/v1/N19-1423 , pages =

work page doi:10.18653/v1/n19-1423
[36]

Pre-Trained Models: Past, Present and Future , url =

Xu Han and Zhengyan Zhang and Ning Ding and Yuxian Gu and Xiao Liu and Yuqi Huo and Jiezhong Qiu and Liang Zhang and Wentao Han and Minlie Huang and Qin Jin and Yanyan Lan and Yang Liu and Zhiyuan Liu and Zhiwu Lu and Xipeng Qiu and Ruihua Song and Jie Tang and Ji-Rong Wen and Jinhui Yuan and Wayne Xin Zhao and Jun Zhu , doi =. Pre-Trained Models: Past, P...

work page
[37]

Lora: Low-rank adaptation of large language models , url =

Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , journal =. Lora: Low-rank adaptation of large language models , url =

work page
[38]

2019 , journal=

Natural Language Understanding with the Quora Question Pairs Dataset , author=. 2019 , journal=

work page 2019
[39]

and Brockett, Chris

Dolan, William B. and Brockett, Chris. Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of IWP Workshop. 2005

work page 2005
[40]

Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems

Ling, Wang and Yogatama, Dani and Dyer, Chris and Blunsom, Phil. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1015

work page doi:10.18653/v1/p17-1015 2017
[41]

A SICK cure for the evaluation of compositional distributional semantic models

Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto. A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014

work page 2014
[42]

2018 , url =

Tushar Khot and Ashish Sabharwal and Peter Clark , Booktitle =. 2018 , url =

work page 2018
[43]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

work page 2004
[44]

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , url =

He, Ruidan and Liu, Linlin and Ye, Hai and Tan, Qingyu and Ding, Bosheng and Cheng, Liying and Low, Jiawei and Bing, Lidong and Si, Luo , booktitle =. On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , url =. doi:10.18653/v1/2021.acl-long.172 , pages =

work page doi:10.18653/v1/2021.acl-long.172 2021
[45]

Proceedings of the 36th International Conference on Machine Learning,

Asa Cooper Stickland and Iain Murray , bibsource =. Proceedings of the 36th International Conference on Machine Learning,

work page
[46]

Parameter-Efficient Transfer Learning for

Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , bibsource =. Parameter-Efficient Transfer Learning for. Proceedings of the 36th International Conference on Machine Learning,

work page
[47]

Ppt: Pre-trained prompt tuning for few-shot learning , url =

Gu, Yuxian and Han, Xu and Liu, Zhiyuan and Huang, Minlie , journal =. Ppt: Pre-trained prompt tuning for few-shot learning , url =

work page
[48]

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , url =

Yuning Mao and Lambert Mathias and Rui Hou and Amjad Almahairi and Hao Ma and Jiawei Han and Wen-tau Yih and Madian Khabsa , journal =. UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , url =

work page
[49]

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , url =

Aghajanyan, Armen and Gupta, Sonal and Zettlemoyer, Luke , booktitle =. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , url =. doi:10.18653/v1/2021.acl-long.568 , pages =

work page doi:10.18653/v1/2021.acl-long.568 2021
[50]

Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning , url =

Qin, Yujia and Wang, Xiaozhi and Su, Yusheng and Lin, Yankai and Ding, Ning and Liu, Zhiyuan and Li, Juanzi and Hou, Lei and Li, Peng and Sun, Maosong and others , journal =. Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning , url =

work page
[51]

Prefix-Tuning: Optimizing Continuous Prompts for Generation , url =

Li, Xiang Lisa and Liang, Percy , booktitle =. Prefix-Tuning: Optimizing Continuous Prompts for Generation , url =. doi:10.18653/v1/2021.acl-long.353 , pages =

work page doi:10.18653/v1/2021.acl-long.353 2021
[52]

The power of scale for parameter-efficient prompt tuning , url =

Lester, Brian and Al-Rfou, Rami and Constant, Noah , journal =. The power of scale for parameter-efficient prompt tuning , url =

work page
[53]

What would elsa do? freezing layers during transformer fine-tuning , url =

Lee, Jaejun and Tang, Raphael and Lin, Jimmy , journal =. What would elsa do? freezing layers during transformer fine-tuning , url =

work page
[54]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , doi =

Pfeiffer, Jonas and R. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , doi =

work page 2020
[55]

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , url =

Zhao, Mengjie and Lin, Tao and Mi, Fei and Jaggi, Martin and Sch. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , url =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =

work page 2020
[56]

SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , url =

Tu Vu and Brian Lester and Noah Constant and Rami Al-Rfou and Daniel Cer , journal =. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , url =

work page
[57]

On Transferability of Prompt Tuning for Natural Language Understanding , url =

Yusheng Su and Xiaozhi Wang and Yujia Qin and Chi-Min Chan and Yankai Lin and Zhiyuan Liu and Peng Li and Juanzi Li and Lei Hou and Maosong Sun and Jie Zhou , journal =. On Transferability of Prompt Tuning for Natural Language Understanding , url =

work page
[58]

Robust Transfer Learning with Pretrained Language Models through Adapters , url =

Han, Wenjuan and Pang, Bo and Wu, Ying Nian , booktitle =. Robust Transfer Learning with Pretrained Language Models through Adapters , url =. doi:10.18653/v1/2021.acl-short.108 , pages =

work page doi:10.18653/v1/2021.acl-short.108 2021
[59]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =

Pfeiffer, Jonas and Kamath, Aishwarya and R. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =

work page
[60]

Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , url =

Zaken, Elad Ben and Ravfogel, Shauli and Goldberg, Yoav , journal =. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , url =

work page
[61]

OpenPrompt: An Open-source Framework for Prompt-learning , url =

Ding, Ning and Hu, Shengding and Zhao, Weilin and Chen, Yulin and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong , journal =. OpenPrompt: An Open-source Framework for Prompt-learning , url =

work page
[62]

Layer Normalization

Layer Normalization , year =. arXiv , author =:1607.06450 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv
[63]

Deep Residual Learning for Image Recognition , url =

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , bibsource =. Deep Residual Learning for Image Recognition , url =. 2016. doi:10.1109/CVPR.2016.90 , pages =

work page doi:10.1109/cvpr.2016.90 2016
[64]

Exploring the Limits of Transfer Learning with a Unified Text-to-Tex , url =

Colin Raffel an , journal =. Exploring the Limits of Transfer Learning with a Unified Text-to-Tex , url =

work page
[65]

doi:10.18653/v1/2020.acl-main.703 , pages =

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , booktitle =. doi:10.18653/v1/2020.acl-main.703 , pages =

work page doi:10.18653/v1/2020.acl-main.703 2020
[66]

RoBERTa: A Robustly Optimized BERT Pretraining Approach , url =

Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , journal =. RoBERTa: A Robustly Optimized BERT Pretraining Approach , url =

work page
[67]

Kingma and Jimmy Ba , bibsource =

Diederik P. Kingma and Jimmy Ba , bibsource =. Adam:. 3rd International Conference on Learning Representations,

work page
[68]

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , url =

Noam Shazeer and Mitchell Stern , bibsource =. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , url =. Proceedings of the 35th International Conference on Machine Learning,

work page
[69]

Datasets: A Community Library for Natural Language Processing , url =

Lhoest, Quentin and Villanova del Moral, Albert and Jernite, Yacine and Thakur, Abhishek and von Platen, Patrick and Patil, Suraj and Chaumond, Julien and Drame, Mariama and Plu, Julien and Tunstall, Lewis and Davison, Joe and. Datasets: A Community Library for Natural Language Processing , url =. Proceedings of the 2021 Conference on Empirical Methods in...

work page 2021
[70]

Bowman , bibsource =

Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , bibsource =. 7th International Conference on Learning Representations,

work page
[71]

Neural Networks , volume=

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning , author=. Neural Networks , volume=. 2018 , publisher=

work page 2018
[72]

Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , url =

Schick, Timo and Sch. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , url =. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =

work page
[73]

Zonghan Yang and Yang Liu , booktitle =

work page
[74]

doi:10.1016/S0076-5392(08)62095-0 , pages =

Kopp, Richard E , booktitle =. doi:10.1016/S0076-5392(08)62095-0 , pages =

work page doi:10.1016/s0076-5392(08)62095-0
[75]

Qianxiao Li and Long Chen and Cheng Tai and Weinan E , bibsource =. J. Mach. Learn. Res. , pages =

work page
[76]

Towards Robust Neural Networks via Close-loop Control , url =

Zhuotong Chen and Qianxiao Li and Zheng Zhang , bibsource =. Towards Robust Neural Networks via Close-loop Control , url =. 9th International Conference on Learning Representations,

work page
[77]

arXiv preprint arXiv:2202.09817 , year=

Y-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning , author=. arXiv preprint arXiv:2202.09817 , year=

work page arXiv
[78]

Parameter-Efficient Transfer Learning with Diff Pruning , url =

Guo, Demi and Rush, Alexander and Kim, Yoon , booktitle =. Parameter-Efficient Transfer Learning with Diff Pruning , url =. doi:10.18653/v1/2021.acl-long.378 , pages =

work page doi:10.18653/v1/2021.acl-long.378 2021
[79]

Multilayer feedforward networks with a nonpolynomial activation function can approximate any function , volume =

Leshno, Moshe and Lin, Vladimir Ya and Pinkus, Allan and Schocken, Shimon , journal =. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function , volume =

work page
[80]

In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Shaokai Ye and Kailu Wu and Mu Zhou and Yunfei Yang and Sia Huat Tan and Kaidi Xu and Jiebo Song and Chenglong Bao and Kaisheng Ma , bibsource =. Light-weight Calibrator:. 2020. doi:10.1109/CVPR42600.2020.01375 , pages =

work page doi:10.1109/cvpr42600.2020.01375 2020
[81]

International Conference on Learning Representations , year=

Towards a Unified View of Parameter-Efficient Transfer Learning , author=. International Conference on Learning Representations , year=

work page
[82]

High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications , year =

Wright, John and Ma, Yi , publisher =. High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications , year =

work page
[83]

Huang, Lifu and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin , booktitle =. Cosmos. doi:10.18653/v1/D19-1243 , pages =

work page doi:10.18653/v1/d19-1243
[84]

doi:10.18653/v1/D19-1608 , pages =

Tafjord, Oyvind and Gardner, Matt and Lin, Kevin and Clark, Peter , booktitle =. doi:10.18653/v1/D19-1608 , pages =

work page doi:10.18653/v1/d19-1608
[85]

Sap, Maarten and Rashkin, Hannah and Chen, Derek and Le Bras, Ronan and Choi, Yejin , booktitle =. Social. doi:10.18653/v1/D19-1454 , pages =

work page doi:10.18653/v1/d19-1454
[86]

Beat the

Bartolo, Max and Roberts, Alastair and Welbl, Johannes and Riedel, Sebastian and Stenetorp, Pontus , doi =. Beat the. Transactions of the Association for Computational Linguistics , pages =

work page
[87]

doi:10.18653/v1/2020.bionlp-1.15 , pages =

Pappas, Dimitris and Stavropoulos, Petros and Androutsopoulos, Ion and McDonald, Ryan , booktitle =. doi:10.18653/v1/2020.bionlp-1.15 , pages =

work page doi:10.18653/v1/2020.bionlp-1.15 2020

Showing first 80 references.