pith. machine review for the scientific record. sign in

arxiv: 2305.14233 · v1 · submitted 2023-05-23 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:19 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords instructional conversationschat language modelsUltraChatLLaMA fine-tuningmulti-turn dialoguesopen-source chat modelsVicuna comparison
0
0 comments X

The pith

Scaling AI-generated multi-turn conversations to 1.5 million dialogues produces a fine-tuned LLaMA that outperforms Vicuna.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that high-quality instructional data can be created entirely by AI without any human queries. It introduces UltraChat, a 1.5-million-dialogue dataset built through iterative generation to cover broad topics and multi-turn interactions. Fine-tuning LLaMA on this data yields UltraLLaMA, which beats prior open-source chat models including Vicuna on evaluations. A reader would care because the result points to data scale and coherence as direct levers for lifting open-source chat performance rather than depending on human-collected queries. Statistical checks confirm the new dataset leads prior ones in size, length, diversity, and coherence.

Core claim

UltraChat consists of 1.5 million high-quality multi-turn instructional dialogues generated iteratively by AI to capture the full range of human-AI interactions across topics and instructions; fine-tuning LLaMA on UltraChat produces UltraLLaMA, which outperforms other open-source chat models including the prior leader Vicuna.

What carries the argument

The iterative multi-turn conversation generation framework that produces diverse, coherent dialogues without human queries.

If this is right

  • UltraLLaMA delivers higher performance than Vicuna and other open-source chat models on evaluations.
  • UltraChat exceeds earlier datasets in scale, average length, diversity, and coherence.
  • Open-source chat models can be improved by scaling AI-generated instructional data instead of relying on human queries.
  • Public release of both the dataset and the model lets others extend the same generation approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Removing the need for human queries could lower the cost and raise the speed of building future training sets.
  • The same iterative generation method might transfer to non-chat tasks such as code or reasoning data.
  • Further increases in dataset size or topic coverage could produce additional capability gains.

Load-bearing premise

Conversations created entirely by AI can still supply enough breadth, coherence, and instructional quality to deliver measurable gains over existing open-source chat models.

What would settle it

If UltraLLaMA shows no improvement or falls behind Vicuna on standard conversational benchmarks and human preference tests, the central claim would not hold.

read the original abstract

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT. Scaling the diversity and quality of such data, although straightforward, stands a great chance of leading to improved performance. This paper aims to improve the upper bound of open-source models further. We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively. UltraChat contains 1.5 million high-quality multi-turn dialogues and covers a wide range of topics and instructions. Our statistical analysis of UltraChat reveals its superiority in various key metrics, including scale, average length, diversity, coherence, etc., solidifying its position as a leading open-source dataset. Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna, the previously recognized state-of-the-art open-source model. The dataset and the model will be publicly released\footnote{\url{https://github.com/thunlp/UltraChat}}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces UltraChat, a 1.5-million-dialogue synthetic dataset of multi-turn instructional conversations generated entirely by AI models via an iterative framework without any human queries. The authors fine-tune LLaMA on this data to produce UltraLLaMA and claim it consistently outperforms prior open-source chat models including Vicuna on evaluations. The dataset and model are scheduled for public release.

Significance. If the empirical gains are substantiated, the work would show that scaling high-quality synthetic instructional data can raise the performance of open-source chat models, supplying both a large public dataset and a stronger baseline. The reported statistical advantages in scale, length, diversity, and coherence metrics for UltraChat constitute a concrete contribution to data-centric approaches in conversational modeling.

major comments (3)
  1. [Abstract] Abstract: the claim that UltraLLaMA 'consistently outperforms' Vicuna and other open-source models is unsupported by any quantitative metrics, win rates, benchmark names, statistical tests, or baseline details, rendering the central empirical result unverifiable from the text.
  2. [Section 3] Section 3 (UltraChat construction): the generation framework is presented only at the level of 'comprehensive framework' and 'iterative' multi-turn synthesis; without explicit prompts, generator models, filtering criteria, or diversity controls, it is impossible to rule out that downstream gains arise from generator-specific stylistic artifacts rather than genuine instructional breadth.
  3. [Section 5] Section 5 (Experiments): no information is supplied on training hyperparameters, data mixture ratios, evaluation prompts, or whether Vicuna was re-evaluated under identical conditions, so the attribution of improvements specifically to UltraChat data quality cannot be assessed.
minor comments (1)
  1. [Abstract] The footnote URL for the GitHub release should be checked for completeness and permanence.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that UltraLLaMA 'consistently outperforms' Vicuna and other open-source models is unsupported by any quantitative metrics, win rates, benchmark names, statistical tests, or baseline details, rendering the central empirical result unverifiable from the text.

    Authors: We agree that the abstract is too high-level. The full manuscript reports concrete results in Section 5, including win rates on MT-Bench and other benchmarks. We will revise the abstract to include key quantitative metrics (e.g., average win rate against Vicuna and specific benchmark scores) so the central claim is directly verifiable. revision: yes

  2. Referee: [Section 3] Section 3 (UltraChat construction): the generation framework is presented only at the level of 'comprehensive framework' and 'iterative' multi-turn synthesis; without explicit prompts, generator models, filtering criteria, or diversity controls, it is impossible to rule out that downstream gains arise from generator-specific stylistic artifacts rather than genuine instructional breadth.

    Authors: We acknowledge that additional implementation details are required for reproducibility. We will expand Section 3 to explicitly list the generator models (GPT-3.5-turbo and GPT-4), the full prompts used at each iteration stage, the quality filtering criteria, and the diversity sampling strategy employed to ensure broad topic coverage. revision: yes

  3. Referee: [Section 5] Section 5 (Experiments): no information is supplied on training hyperparameters, data mixture ratios, evaluation prompts, or whether Vicuna was re-evaluated under identical conditions, so the attribution of improvements specifically to UltraChat data quality cannot be assessed.

    Authors: We agree that these details are essential. The revised Section 5 will report the exact training hyperparameters, data mixture ratios, evaluation prompts, and confirm that all baselines including Vicuna were re-evaluated under identical conditions and prompts to support attribution to UltraChat. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains measured on external benchmarks

full rationale

The paper constructs UltraChat via an iterative synthetic generation framework and fine-tunes LLaMA to obtain UltraLLaMA, then reports performance via direct comparison against external open-source models (Vicuna and others) on standard benchmarks. No equations, fitted parameters, or self-citations are invoked as load-bearing premises; the claimed superiority is presented strictly as an empirical outcome of the data scale and quality, independent of any internal redefinition or tautological reduction. The derivation chain therefore remains self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that high-quality multi-turn instructional data improves chat performance; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • domain assumption Instruction tuning on diverse multi-turn conversations improves chat model capabilities
    Standard premise in the LLM fine-tuning literature invoked to justify the dataset and training step.

pith-pipeline@v0.9.0 · 5548 in / 1230 out tokens · 41993 ms · 2026-05-15T17:19:38.619498+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Foundation/LawOfExistence law_of_existence unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively.

  • Foundation/LogicAsFunctionalEquation RCL_is_unique_functional_form_of_logic unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs

    cs.LG 2026-05 unverdicted novelty 7.0

    Semantic consensus on model outputs for public prompts enables federated LLM fine-tuning that matches parameter-aggregation baselines with orders-of-magnitude lower communication.

  2. IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning

    cs.LG 2026-04 unverdicted novelty 7.0

    IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of...

  3. Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

    cs.LG 2026-04 unverdicted novelty 7.0

    Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motiv...

  4. LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

    cs.CL 2024-10 unverdicted novelty 7.0

    LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

  5. Self-Rewarding Language Models

    cs.CL 2024-01 conditional novelty 7.0

    Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

  6. DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

    cs.LG 2026-05 conditional novelty 6.0

    DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.

  7. DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

    cs.LG 2026-05 unverdicted novelty 6.0

    DECO sparse MoE matches dense Transformer performance at 20% expert activation with a 3x hardware inference speedup.

  8. TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

    cs.CR 2026-04 unverdicted novelty 6.0

    TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

  9. NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

    cs.AR 2026-04 unverdicted novelty 6.0

    NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.

  10. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

    cs.CV 2024-12 unverdicted novelty 6.0

    InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.

  11. SnapKV: LLM Knows What You are Looking for Before Generation

    cs.CL 2024-04 conditional novelty 6.0

    SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable pe...

  12. Do Linear Probes Generalize Better in Persona Coordinates?

    cs.AI 2026-05 unverdicted novelty 5.0

    Probes on persona principal components from contrastive prompts generalize better than raw activation probes for harmful behaviors across 10 datasets.

  13. Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

    cs.LG 2026-04 unverdicted novelty 5.0

    ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.

  14. Acceptance Dynamics Across Cognitive Domains in Speculative Decoding

    cs.AI 2026-04 unverdicted novelty 5.0

    Empirical measurements across four NLP domains show task type is a stronger predictor of speculative decoding acceptance than tree depth, with chat uniquely achieving expected accepted length over 1 token per step.

  15. Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models

    cs.CL 2026-04 unverdicted novelty 5.0

    Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.

  16. SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

    cs.CL 2025-02 unverdicted novelty 5.0

    SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

  17. MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    cs.CV 2024-08 conditional novelty 5.0

    MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.

  18. Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering

    cs.CL 2026-04 unverdicted novelty 4.0

    Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.

  19. Yi: Open Foundation Models by 01.AI

    cs.CL 2024-03 unverdicted novelty 4.0

    Yi models are 6B and 34B open foundation models pretrained on 3.1T curated tokens that achieve strong benchmark results through data quality and targeted extensions like long context and vision alignment.

  20. Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

    cs.AI 2025-01 unverdicted novelty 3.0

    The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

  21. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

    cs.CL 2024-12 accept novelty 3.0

    A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Reference graph

Works this paper leans on

253 extracted references · 253 canonical work pages · cited by 20 Pith papers · 15 internal anchors

  1. [1]

    OpenAI , year=

    Chatgpt: Optimizing language models for dialogue , author=. OpenAI , year=

  2. [3]

    arXiv , year=

    GPT-4 technical report , author=. arXiv , year=

  3. [4]

    Stanford Center for Research on Foundation Models

    Alpaca: A Strong, Replicable Instruction-Following Model , author=. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html , year=

  4. [6]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

  5. [7]

    Compacter: Efficient low-rank hypercomplex adapter layers , url =

    Mahabadi, Rabeeh Karimi and Henderson, James and Ruder, Sebastian , journal =. Compacter: Efficient low-rank hypercomplex adapter layers , url =

  6. [8]

    All nlp tasks are generation tasks: A general pretraining framework , url =

    Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie , journal =. All nlp tasks are generation tasks: A general pretraining framework , url =

  7. [9]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Film: Visual reasoning with a general conditioning layer , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  8. [10]

    Advances in neural information processing systems , volume=

    Learning multiple visual domains with residual adapters , author=. Advances in neural information processing systems , volume=

  9. [11]

    Convolutional Neural Networks for Sentence Classification

    Kim, Yoon. Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014. doi:10.3115/v1/D14-1181

  10. [12]

    A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification

    A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification , author=. arXiv preprint arXiv:1510.03820 , year=

  11. [13]

    ACL , year=

    A convolutional neural network for modelling sentences , author=. ACL , year=

  12. [14]

    Neural computation , volume=

    Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=

  13. [16]

    Advances in Neural Information Processing Systems , volume=

    A neural probabilistic language model , author=. Advances in Neural Information Processing Systems , volume=. 2000 , url=

  14. [17]

    Paradigm shift in natural language processing , url =

    Sun, Tianxiang and Liu, Xiangyang and Qiu, Xipeng and Huang, Xuanjing , journal =. Paradigm shift in natural language processing , url =

  15. [18]

    Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , url =

    Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham , journal =. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , url =

  16. [19]

    Single-dataset experts for multi-dataset question answering , url =

    Friedman, Dan and Dodge, Ben and Chen, Danqi , journal =. Single-dataset experts for multi-dataset question answering , url =

  17. [20]

    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =

    Pfeiffer, Jonas and Vuli. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =

  18. [21]

    MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer , url =

    Chalkidis, Ilias and Fergadiotis, Manos and Androutsopoulos, Ion , journal =. MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer , url =

  19. [22]

    Proceedings of EMNLP , pages =

    R. Proceedings of EMNLP , pages =

  20. [23]

    Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , title =

  21. [24]

    Language models are unsupervised multitask learners , volume =

    Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others , journal =. Language models are unsupervised multitask learners , volume =

  22. [26]

    LIMA: Less Is More for Alignment , author=

  23. [27]

    Nature Machine Intelligence , pages=

    Parameter-efficient fine-tuning of large-scale pre-trained language models , author=. Nature Machine Intelligence , pages=. 2023 , publisher=

  24. [29]

    Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs , author=

  25. [32]

    GitHub repository , url=

    Hello Dolly: Democratizing the magic of ChatGPT with open models , author=. GitHub repository , url=. 2023 , publisher =

  26. [33]

    Exploring the limits of transfer learning with a unified text-to-text transformer , url =

    Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J , journal =. Exploring the limits of transfer learning with a unified text-to-text transformer , url =

  27. [34]

    Gomez and Lukasz Kaiser and Illia Polosukhin , bibsource =

    Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , bibsource =. Attention is All you Need , url =. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,

  28. [35]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle =. doi:10.18653/v1/N19-1423 , pages =

  29. [36]

    Pre-Trained Models: Past, Present and Future , url =

    Xu Han and Zhengyan Zhang and Ning Ding and Yuxian Gu and Xiao Liu and Yuqi Huo and Jiezhong Qiu and Liang Zhang and Wentao Han and Minlie Huang and Qin Jin and Yanyan Lan and Yang Liu and Zhiyuan Liu and Zhiwu Lu and Xipeng Qiu and Ruihua Song and Jie Tang and Ji-Rong Wen and Jinhui Yuan and Wayne Xin Zhao and Jun Zhu , doi =. Pre-Trained Models: Past, P...

  30. [37]

    Lora: Low-rank adaptation of large language models , url =

    Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , journal =. Lora: Low-rank adaptation of large language models , url =

  31. [38]

    2019 , journal=

    Natural Language Understanding with the Quora Question Pairs Dataset , author=. 2019 , journal=

  32. [39]

    and Brockett, Chris

    Dolan, William B. and Brockett, Chris. Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of IWP Workshop. 2005

  33. [40]

    Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems

    Ling, Wang and Yogatama, Dani and Dyer, Chris and Blunsom, Phil. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1015

  34. [41]

    A SICK cure for the evaluation of compositional distributional semantic models

    Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto. A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014

  35. [42]

    2018 , url =

    Tushar Khot and Ashish Sabharwal and Peter Clark , Booktitle =. 2018 , url =

  36. [43]

    ROUGE : A Package for Automatic Evaluation of Summaries

    Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

  37. [44]

    On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , url =

    He, Ruidan and Liu, Linlin and Ye, Hai and Tan, Qingyu and Ding, Bosheng and Cheng, Liying and Low, Jiawei and Bing, Lidong and Si, Luo , booktitle =. On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , url =. doi:10.18653/v1/2021.acl-long.172 , pages =

  38. [45]

    Proceedings of the 36th International Conference on Machine Learning,

    Asa Cooper Stickland and Iain Murray , bibsource =. Proceedings of the 36th International Conference on Machine Learning,

  39. [46]

    Parameter-Efficient Transfer Learning for

    Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , bibsource =. Parameter-Efficient Transfer Learning for. Proceedings of the 36th International Conference on Machine Learning,

  40. [47]

    Ppt: Pre-trained prompt tuning for few-shot learning , url =

    Gu, Yuxian and Han, Xu and Liu, Zhiyuan and Huang, Minlie , journal =. Ppt: Pre-trained prompt tuning for few-shot learning , url =

  41. [48]

    UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , url =

    Yuning Mao and Lambert Mathias and Rui Hou and Amjad Almahairi and Hao Ma and Jiawei Han and Wen-tau Yih and Madian Khabsa , journal =. UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , url =

  42. [49]

    Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , url =

    Aghajanyan, Armen and Gupta, Sonal and Zettlemoyer, Luke , booktitle =. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , url =. doi:10.18653/v1/2021.acl-long.568 , pages =

  43. [50]

    Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning , url =

    Qin, Yujia and Wang, Xiaozhi and Su, Yusheng and Lin, Yankai and Ding, Ning and Liu, Zhiyuan and Li, Juanzi and Hou, Lei and Li, Peng and Sun, Maosong and others , journal =. Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning , url =

  44. [51]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation , url =

    Li, Xiang Lisa and Liang, Percy , booktitle =. Prefix-Tuning: Optimizing Continuous Prompts for Generation , url =. doi:10.18653/v1/2021.acl-long.353 , pages =

  45. [52]

    The power of scale for parameter-efficient prompt tuning , url =

    Lester, Brian and Al-Rfou, Rami and Constant, Noah , journal =. The power of scale for parameter-efficient prompt tuning , url =

  46. [53]

    What would elsa do? freezing layers during transformer fine-tuning , url =

    Lee, Jaejun and Tang, Raphael and Lin, Jimmy , journal =. What would elsa do? freezing layers during transformer fine-tuning , url =

  47. [54]

    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , doi =

    Pfeiffer, Jonas and R. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , doi =

  48. [55]

    Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , url =

    Zhao, Mengjie and Lin, Tao and Mi, Fei and Jaggi, Martin and Sch. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , url =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , doi =

  49. [56]

    SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , url =

    Tu Vu and Brian Lester and Noah Constant and Rami Al-Rfou and Daniel Cer , journal =. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , url =

  50. [57]

    On Transferability of Prompt Tuning for Natural Language Understanding , url =

    Yusheng Su and Xiaozhi Wang and Yujia Qin and Chi-Min Chan and Yankai Lin and Zhiyuan Liu and Peng Li and Juanzi Li and Lei Hou and Maosong Sun and Jie Zhou , journal =. On Transferability of Prompt Tuning for Natural Language Understanding , url =

  51. [58]

    Robust Transfer Learning with Pretrained Language Models through Adapters , url =

    Han, Wenjuan and Pang, Bo and Wu, Ying Nian , booktitle =. Robust Transfer Learning with Pretrained Language Models through Adapters , url =. doi:10.18653/v1/2021.acl-short.108 , pages =

  52. [59]

    Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =

    Pfeiffer, Jonas and Kamath, Aishwarya and R. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =

  53. [60]

    Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , url =

    Zaken, Elad Ben and Ravfogel, Shauli and Goldberg, Yoav , journal =. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , url =

  54. [61]

    OpenPrompt: An Open-source Framework for Prompt-learning , url =

    Ding, Ning and Hu, Shengding and Zhao, Weilin and Chen, Yulin and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong , journal =. OpenPrompt: An Open-source Framework for Prompt-learning , url =

  55. [62]

    Layer Normalization

    Layer Normalization , year =. arXiv , author =:1607.06450 , primaryclass =

  56. [63]

    Deep Residual Learning for Image Recognition , url =

    Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , bibsource =. Deep Residual Learning for Image Recognition , url =. 2016. doi:10.1109/CVPR.2016.90 , pages =

  57. [64]

    Exploring the Limits of Transfer Learning with a Unified Text-to-Tex , url =

    Colin Raffel an , journal =. Exploring the Limits of Transfer Learning with a Unified Text-to-Tex , url =

  58. [65]

    doi:10.18653/v1/2020.acl-main.703 , pages =

    Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , booktitle =. doi:10.18653/v1/2020.acl-main.703 , pages =

  59. [66]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach , url =

    Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , journal =. RoBERTa: A Robustly Optimized BERT Pretraining Approach , url =

  60. [67]

    Kingma and Jimmy Ba , bibsource =

    Diederik P. Kingma and Jimmy Ba , bibsource =. Adam:. 3rd International Conference on Learning Representations,

  61. [68]

    Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , url =

    Noam Shazeer and Mitchell Stern , bibsource =. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , url =. Proceedings of the 35th International Conference on Machine Learning,

  62. [69]

    Datasets: A Community Library for Natural Language Processing , url =

    Lhoest, Quentin and Villanova del Moral, Albert and Jernite, Yacine and Thakur, Abhishek and von Platen, Patrick and Patil, Suraj and Chaumond, Julien and Drame, Mariama and Plu, Julien and Tunstall, Lewis and Davison, Joe and. Datasets: A Community Library for Natural Language Processing , url =. Proceedings of the 2021 Conference on Empirical Methods in...

  63. [70]

    Bowman , bibsource =

    Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , bibsource =. 7th International Conference on Learning Representations,

  64. [71]

    Neural Networks , volume=

    Sigmoid-weighted linear units for neural network function approximation in reinforcement learning , author=. Neural Networks , volume=. 2018 , publisher=

  65. [72]

    Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , url =

    Schick, Timo and Sch. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , url =. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages =

  66. [73]

    Zonghan Yang and Yang Liu , booktitle =

  67. [74]

    doi:10.1016/S0076-5392(08)62095-0 , pages =

    Kopp, Richard E , booktitle =. doi:10.1016/S0076-5392(08)62095-0 , pages =

  68. [75]

    Qianxiao Li and Long Chen and Cheng Tai and Weinan E , bibsource =. J. Mach. Learn. Res. , pages =

  69. [76]

    Towards Robust Neural Networks via Close-loop Control , url =

    Zhuotong Chen and Qianxiao Li and Zheng Zhang , bibsource =. Towards Robust Neural Networks via Close-loop Control , url =. 9th International Conference on Learning Representations,

  70. [77]

    arXiv preprint arXiv:2202.09817 , year=

    Y-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning , author=. arXiv preprint arXiv:2202.09817 , year=

  71. [78]

    Parameter-Efficient Transfer Learning with Diff Pruning , url =

    Guo, Demi and Rush, Alexander and Kim, Yoon , booktitle =. Parameter-Efficient Transfer Learning with Diff Pruning , url =. doi:10.18653/v1/2021.acl-long.378 , pages =

  72. [79]

    Multilayer feedforward networks with a nonpolynomial activation function can approximate any function , volume =

    Leshno, Moshe and Lin, Vladimir Ya and Pinkus, Allan and Schocken, Shimon , journal =. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function , volume =

  73. [80]

    In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Shaokai Ye and Kailu Wu and Mu Zhou and Yunfei Yang and Sia Huat Tan and Kaidi Xu and Jiebo Song and Chenglong Bao and Kaisheng Ma , bibsource =. Light-weight Calibrator:. 2020. doi:10.1109/CVPR42600.2020.01375 , pages =

  74. [81]

    International Conference on Learning Representations , year=

    Towards a Unified View of Parameter-Efficient Transfer Learning , author=. International Conference on Learning Representations , year=

  75. [82]

    High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications , year =

    Wright, John and Ma, Yi , publisher =. High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications , year =

  76. [83]

    Huang, Lifu and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin , booktitle =. Cosmos. doi:10.18653/v1/D19-1243 , pages =

  77. [84]

    doi:10.18653/v1/D19-1608 , pages =

    Tafjord, Oyvind and Gardner, Matt and Lin, Kevin and Clark, Peter , booktitle =. doi:10.18653/v1/D19-1608 , pages =

  78. [85]

    Sap, Maarten and Rashkin, Hannah and Chen, Derek and Le Bras, Ronan and Choi, Yejin , booktitle =. Social. doi:10.18653/v1/D19-1454 , pages =

  79. [86]

    Beat the

    Bartolo, Max and Roberts, Alastair and Welbl, Johannes and Riedel, Sebastian and Stenetorp, Pontus , doi =. Beat the. Transactions of the Association for Computational Linguistics , pages =

  80. [87]

    doi:10.18653/v1/2020.bionlp-1.15 , pages =

    Pappas, Dimitris and Stavropoulos, Petros and Androutsopoulos, Ion and McDonald, Ryan , booktitle =. doi:10.18653/v1/2020.bionlp-1.15 , pages =

Showing first 80 references.