arxiv: 2512.02764 · v3 · submitted 2025-12-02 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models

Robert Belanec , Ivan Srba , Maria Bielikova

Authors on Pith no claims yet

Pith reviewed 2026-05-17 02:33 UTC · model grok-4.3

classification 💻 cs.CL

keywords PEFTparameter-efficient fine-tuninglarge language modelsbenchmarkingreplicabilityunified frameworkLLM fine-tuningmodular design

0 comments

The pith

PEFT-Factory supplies one controlled environment that bundles 19 PEFT methods with 27 datasets for reproducible LLM fine-tuning comparisons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

New parameter-efficient fine-tuning methods for large language models often prove difficult to replicate or compare fairly because each arrives with its own code and setup. The paper presents PEFT-Factory as a single downstream framework that incorporates both off-the-shelf and custom PEFT approaches along with classification and text-generation datasets plus standard and method-specific metrics. Its modular structure is meant to let users add new methods while keeping experiments in a stable, ready-to-run state. A reader would care if the claim holds because it could reduce duplicated effort and make performance numbers across different techniques more trustworthy.

Core claim

PEFT-Factory is introduced as a unified framework originating from LLaMA-Factory that natively implements a representative set of 19 PEFT methods, supplies 27 datasets spanning 12 tasks, and includes both standard and PEFT-specific evaluation metrics, thereby creating a ready-to-use, controlled, and stable environment that improves replicability and benchmarking of PEFT methods.

What carries the argument

The modular design of PEFT-Factory that supports extensibility while delivering native implementations of 19 PEFT methods together with fixed datasets and metrics.

If this is right

Newly proposed PEFT methods can be added and tested against the existing set without rebuilding the surrounding pipeline.
Comparisons of method performance become possible under identical data splits, evaluation protocols, and hardware conditions.
Researchers gain immediate access to both classification and text-generation benchmarks when evaluating a new technique.
Custom PEFT variants can be inserted into the same controlled environment used for the built-in methods.
Results reported from the framework carry consistent metrics that combine general and PEFT-specific measures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could serve as a shared reference point that later papers adopt to report results, reducing the current scatter of incompatible experimental setups.
Extending the same modular structure to newer model families or additional task types would follow naturally from the design choices already made.
If adoption grows, the collection of 27 datasets might evolve into a de-facto standard testbed for efficient fine-tuning research.
Teams building production systems could use the same codebase to prototype and then deploy a chosen PEFT method with less translation effort.

Load-bearing premise

The native implementations of the 19 PEFT methods produce stable and comparable results across the 27 datasets without hidden code differences or extra tuning steps that would make fair comparison impossible.

What would settle it

Running the same PEFT method on the same dataset inside and outside PEFT-Factory and finding materially different performance numbers that trace to unaccounted implementation choices.

Figures

Figures reproduced from arXiv: 2512.02764 by Ivan Srba, Maria Bielikova, Robert Belanec.

**Figure 1.** Figure 1: Diagram representing the components of PEFT-FACTORY. The four main overarching components of PEFT-FACTORY are PEFT Methods, Datasets, Models, and Metrics, which are further defined by their subcomponents. Components represented by green color are implemented in PEFT-FACTORY, components in blue color are native to LLaMA-Factory (Zheng et al., 2024a). Additionally, the Adapters library requires a different … view at source ↗

**Figure 5.** Figure 5: Example directory structure of custom PEFT [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 2.** Figure 2: Selection of PEFT methods from Finetuning method dropdown menu. All 19 PEFT methods included in PEFT-FACTORY are available to choose. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Configuration options for the Prompt Tuning method. [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Classification and PSCP results for prediction after training with Prompt Tuning. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

Parameter-Efficient Fine-Tuning (PEFT) methods address the increasing size of Large Language Models (LLMs). Currently, many newly introduced PEFT methods are challenging to replicate, deploy, or compare with one another. To address this, we introduce PEFT-Factory, a unified framework for efficient fine-tuning LLMs using both off-the-shelf and custom PEFT methods. While its modular design supports extensibility, it natively provides a representative set of 19 PEFT methods, 27 classification and text generation datasets addressing 12 tasks, and both standard and PEFT-specific evaluation metrics. As a result, PEFT-Factory provides a ready-to-use, controlled, and stable environment, improving replicability and benchmarking of PEFT methods. PEFT-Factory is a downstream framework that originates from the popular LLaMA-Factory, and is publicly available at https://github.com/kinit-sk/PEFT-Factory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PEFT-Factory bundles 19 existing PEFT methods and 27 datasets into an extensible LLaMA-Factory fork for easier benchmarking, but adds no new algorithms or empirical checks.

read the letter

The key point is that this paper presents PEFT-Factory as a unified framework for parameter-efficient fine-tuning of large language models. It combines 19 methods and 27 datasets in one extensible package derived from LLaMA-Factory, aiming to improve replicability through a controlled environment. The paper does a reasonable job outlining the architecture and the included components. Having native support for a representative set of methods along with task-specific datasets and metrics is practical. The emphasis on extensibility for custom PEFT methods adds some flexibility that could be useful for ongoing work in the area. Where it is softer is in the validation. There are no empirical results, ablations, or reproduction checks provided in the manuscript. The claim of improved replicability is based on the framework's design and public release rather than demonstrated performance or user feedback. This is common for tool papers but leaves the actual impact open to question. Readers who would benefit are those needing a consolidated toolkit for comparing PEFT techniques on classification and text generation tasks. It offers value as an engineering support rather than a scientific advance. Overall, this deserves a serious referee because the artifact might help standardize practices if the implementations are reliable. I recommend sending it for peer review with a focus on the code repository and any potential discrepancies in the method implementations.

Referee Report

2 major / 3 minor

Summary. The paper introduces PEFT-Factory, a unified framework for parameter-efficient fine-tuning of autoregressive large language models. It natively implements 19 PEFT methods, supports 27 classification and text generation datasets across 12 tasks, and includes both standard and PEFT-specific evaluation metrics. The modular design enables extensibility for custom methods, and the framework is presented as a downstream extension of LLaMA-Factory that is publicly released to improve replicability and benchmarking.

Significance. If the native implementations faithfully reproduce published PEFT performance and the framework is actively maintained, it could provide a valuable standardized environment for the community, reducing the effort required for fair comparisons and replication studies in the fast-moving PEFT literature. The public GitHub release and stated support for both off-the-shelf and custom methods are concrete strengths that directly support the replicability goal.

major comments (2)

[§4] The central claim that PEFT-Factory supplies a 'controlled and stable environment' for benchmarking rests on the correctness of the 19 native implementations, yet the manuscript contains no reproduction experiments, ablation studies, or direct comparisons against original published results for any of the bundled methods. This verification is load-bearing for the replicability assertion.
[§3.2] The description of the modular architecture does not address how the framework ensures that custom or off-the-shelf PEFT methods produce results free of hidden implementation discrepancies when run across the 27 datasets; without such safeguards or tests, the benchmarking utility remains unproven.

minor comments (3)

[Abstract] The abstract lists 'both standard and PEFT-specific evaluation metrics' but does not enumerate or define the PEFT-specific metrics; a short table or paragraph in §4 would improve clarity.
A summary table listing the 19 PEFT methods, their core hyperparameters, and the tasks they support would help readers quickly assess coverage.
[§2] The relationship between PEFT-Factory and the parent LLaMA-Factory codebase is mentioned but not detailed; explicit notes on which components were modified or extended would aid users who are already familiar with LLaMA-Factory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and describe the revisions we will make to strengthen the replicability aspects of the manuscript.

read point-by-point responses

Referee: [§4] The central claim that PEFT-Factory supplies a 'controlled and stable environment' for benchmarking rests on the correctness of the 19 native implementations, yet the manuscript contains no reproduction experiments, ablation studies, or direct comparisons against original published results for any of the bundled methods. This verification is load-bearing for the replicability assertion.

Authors: We agree that direct verification of the native implementations would better support the claim of a controlled environment. The manuscript emphasizes the unified framework and its extensibility rather than exhaustive benchmarking, which we viewed as outside the primary scope. In the revision we will add a dedicated subsection (or appendix) presenting reproduction results for a representative subset of the 19 methods on standard datasets, comparing against numbers reported in the original PEFT papers. This will provide concrete evidence of implementation fidelity. revision: yes
Referee: [§3.2] The description of the modular architecture does not address how the framework ensures that custom or off-the-shelf PEFT methods produce results free of hidden implementation discrepancies when run across the 27 datasets; without such safeguards or tests, the benchmarking utility remains unproven.

Authors: We acknowledge that §3.2 could more explicitly describe the consistency mechanisms. All methods share the same data loaders, training loop, and evaluation harness; custom methods are required to implement a narrow interface that returns only the adapted parameters or logits. The repository already contains unit tests and example configuration files that exercise this interface across multiple datasets. We will revise the architecture description to highlight these safeguards and note that the evaluation pipeline is deliberately method-agnostic. revision: yes

Circularity Check

0 steps flagged

No significant circularity in engineering framework paper

full rationale

The manuscript introduces PEFT-Factory as a software framework providing native implementations of 19 PEFT methods and 27 datasets for improved replicability. No mathematical derivations, equations, predictions, or fitted parameters exist in the paper. The central claim rests on the public GitHub release and modular design, which are externally verifiable artifacts independent of any self-referential logic or self-citation chains. This is a standard engineering contribution whose correctness does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the engineering assumption that a single modular codebase can faithfully reproduce 19 distinct PEFT methods without introducing systematic bias; no free parameters, mathematical axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5466 in / 1248 out tokens · 37453 ms · 2026-05-17T02:33:14.545638+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we introduce PEFT-FACTORY, a unified framework for efficient fine-tuning LLMs using both off-the-shelf and custom PEFT methods... natively provides a representative set of 19 PEFT methods, 27 classification and text generation datasets
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

implements a standardized PEFT interface... dynamic loading mechanism for custom PEFT methods

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 15 internal anchors

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, and James Zou. 2019. Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569

work page internal anchor Pith review Pith/arXiv arXiv 2019
[4]

Lightning AI. 2023. Litgpt. https://github.com/Lightning-AI/litgpt

work page 2023
[5]

Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. 2019. https://doi.org/10.18653/v1/N19-1245 M ath QA : Towards interpretable math word problem solving with operation-based formalisms . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics:...

work page doi:10.18653/v1/n19-1245 2019
[6]

Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, Benjamin Schmidt, and Andriy Mulyar. 2023. Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. https://github.com/nomic-ai/gpt4all

work page 2023
[7]

Akari Asai, Mohammadreza Salehi, Matthew Peters, and Hannaneh Hajishirzi. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.446 ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655--6672, Abu Dhabi, United Arab Emirat...

work page doi:10.18653/v1/2022.emnlp-main.446 2022
[8]

Axolotl maintainers and contributors . 2023. https://github.com/axolotl-ai-cloud/axolotl Axolotl: Open source llm post-training

work page 2023
[9]

Roy Bar Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and Idan Szpektor. 2006. The second PASCAL recognising textual entailment challenge

work page 2006
[10]

Robert Belanec, Branislav Pecher, Ivan Srba, and Maria Bielikova. 2025. https://arxiv.org/abs/2511.21285 Peft-bench: A parameter-efficient fine-tuning methods benchmark . arXiv preprint

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. https://doi.org/10.18653/v1/2022.acl-short.1 B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1--9, Dublin, Ireland. Association...

work page doi:10.18653/v1/2022.acl-short.1 2022
[12]

Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge

work page 2009
[13]

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others. 2020. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432--7439

work page 2020
[14]

Arthur Cayley. 1846. Sur quelques propri \'e t \'e s des d \'e terminants gauches. Journal für die reine und angewandte Mathematik

work page
[15]

Daniel Cer, Mona Diab, Eneko Agirre, I \ n igo Lopez-Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/v1/S17-2001 S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017) , pages 1--14, Vancouver, Canada. ACL

work page doi:10.18653/v1/s17-2001 2017
[16]

Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca

work page 2023
[17]

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1300 B ool Q : Exploring the surprising difficulty of natural yes/no questions . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language...

work page doi:10.18653/v1/n19-1300 2019
[18]

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, and 1 others. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168

work page internal anchor Pith review Pith/arXiv arXiv 2021
[19]

Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. https://doi.org/10.1007/11736790_9 The pascal recognising textual entailment challenge . In Proceedings of the First International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, MLCW'05, page 177–190, Berlin...

work page doi:10.1007/11736790_9 2005
[20]

Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. 2019. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, volume 23, pages 107--124

work page 2019
[21]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. Qlora: efficient finetuning of quantized llms. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS '23, Red Hook, NY, USA. Curran Associates Inc

work page 2023
[22]

Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, and Tong Zhang. 2023. Lmflow: An extensible toolkit for finetuning and inference of large foundation models. arXiv preprint arXiv:2306.12420

work page arXiv 2023
[23]

Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, and 1 others. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220--235

work page 2023
[24]

William B Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the International Workshop on Paraphrasing

work page 2005
[25]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, and 1 others. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, and Yu Qiao. 2023. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. 2007. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1--9. Association for Computational Linguistics

work page 2007
[28]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, and 1 others. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. 2024. https://openreview.net/forum?id=lIsCS8b6zj Parameter-efficient fine-tuning for large models: A comprehensive survey . Transactions on Machine Learning Research

work page 2024
[30]

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. 2024. Lora+: efficient low rank adaptation of large models. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

work page 2024
[31]

Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2022. https://openreview.net/forum?id=0RDcd5Axok Towards a unified view of parameter-efficient transfer learning . In International Conference on Learning Representations

work page 2022
[32]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. https://openreview.net/forum?id=d7KBjmI3GmQ Measuring massive multitask language understanding . In International Conference on Learning Representations

work page 2021
[33]

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790--2799. PMLR

work page 2019
[34]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3

work page 2022
[35]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361

work page internal anchor Pith review Pith/arXiv arXiv 2020
[36]

Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), page...

work page 2018
[37]

Tushar Khot, Ashish Sabharwal, and Peter Clark. 2019. https://doi.org/10.18653/v1/D19-1281 What ' s missing: A knowledge gap guided approach for multi-hop question answering . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), p...

work page doi:10.18653/v1/d19-1281 2019
[38]

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.243 The power of scale for parameter-efficient prompt tuning . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045--3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics

work page doi:10.18653/v1/2021.emnlp-main.243 2021
[39]

Hector J Levesque, Ernest Davis, and Leora Morgenstern. 2011. The W inograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , volume 46, page 47

work page 2011
[40]

Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, and Yang You. 2023. https://doi.org/10.1145/3605573.3605613 Colossal-ai: A unified deep learning system for large-scale parallel training . In Proceedings of the 52nd International Conference on Parallel Processing, ICPP '23, page 766–775, New York, NY, USA. Ass...

work page doi:10.1145/3605573.3605613 2023
[41]

Xiang Lisa Li and Percy Liang. 2021. https://doi.org/10.18653/v1/2021.acl-long.353 Prefix-tuning: Optimizing continuous prompts for generation . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582--4597, Onl...

work page doi:10.18653/v1/2021.acl-long.353 2021
[42]

Vladislav Lialin, Vijeta Deshpande, and Anna Rumshisky. 2023. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647

work page arXiv 2023
[43]

Chin-Yew Lin. 2004. https://aclanthology.org/W04-1013/ ROUGE : A package for automatic evaluation of summaries . In Text Summarization Branches Out, pages 74--81, Barcelona, Spain. Association for Computational Linguistics

work page 2004
[44]

Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, and sujay sanghavi. 2024. https://openreview.net/forum?id=DOUskwCqg5 SVFT : Parameter-efficient fine-tuning with singular vectors . In 2nd Workshop on Advancing Neural Network Training: Computational Ef...

work page 2024
[45]

Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A Raffel. 2022 a . Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950--1965

work page 2022
[46]

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. 2024. Dora: weight-decomposed low-rank adaptation. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

work page 2024
[47]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022 b . https://doi.org/10.18653/v1/2022.acl-short.8 P -tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61--68, Dub...

work page doi:10.18653/v1/2022.acl-short.8 2022
[48]

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2023. Gpt understands, too. AI Open

work page 2023
[49]

Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft

work page 2022
[50]

Fanxu Meng, Zhaohui Wang, and Muhan Zhang. 2024. Pissa: principal singular values and singular vectors adaptation of large language models. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS '24, Red Hook, NY, USA. Curran Associates Inc

work page 2024
[51]

Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2024. Large language models: A survey. arXiv preprint arXiv:2402.06196

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. https://doi.org/10.3115/1073083.1073135 B leu: a method for automatic evaluation of machine translation . In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311--318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics

work page doi:10.3115/1073083.1073135 2002
[53]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\" o pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, and 2 others. 2019. Pytorch: an imperative style, high-performance deep l...

work page 2019
[54]

Arkil Patel, Satwik Bhattamishra, and Navin Goyal. 2021. https://doi.org/10.18653/v1/2021.naacl-main.168 Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080--2094, Online. Association for ...

work page internal anchor Pith review doi:10.18653/v1/2021.naacl-main.168 2021
[55]

Jonas Pfeiffer, Ivan Vuli \'c , Iryna Gurevych, and Sebastian Ruder. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.617 MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654--7673, Online. Association for Comput...

work page doi:10.18653/v1/2020.emnlp-main.617 2020
[56]

Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. https://doi.org/10.18653/v1/N19-1128 W i C : the word-in-context dataset for evaluating context-sensitive meaning representations . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and S...

work page doi:10.18653/v1/n19-1128 2019
[57]

Clifton Poth, Hannah Sterz, Indraneil Paul, Sukannya Purkayastha, Leon Engl \"a nder, Timo Imhof, Ivan Vuli \'c , Sebastian Ruder, Iryna Gurevych, and Jonas Pfeiffer. 2023. https://aclanthology.org/2023.emnlp-demo.13 Adapters: A unified library for parameter-efficient and modular transfer learning . In Proceedings of the 2023 Conference on Empirical Metho...

work page 2023
[58]

Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, and Bernhard Sch \"o lkopf. 2023. Controlling text-to-image diffusion by orthogonal finetuning. Advances in Neural Information Processing Systems, 36:79320--79362

work page 2023
[59]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, and 1 others. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9

work page 2019
[60]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1--67

work page 2020
[61]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQ u AD : 100,000+ questions for machine comprehension of text. In Proceedings of EMNLP, pages 2383--2392. Association for Computational Linguistics

work page 2016
[62]

Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S Gordon. 2011. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In AAAI spring symposium: logical formalizations of commonsense reasoning, pages 90--95

work page 2011
[63]

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2021. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99--106

work page 2021
[64]

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. 2019. https://doi.org/10.18653/v1/D19-1454 Social IQ a: Commonsense reasoning about social interactions . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),...

work page doi:10.18653/v1/d19-1454 2019
[65]

Zhengxiang Shi and Aldo Lipani. 2024. https://openreview.net/forum?id=KjegfPGRde De PT : Decomposed prompt tuning for parameter-efficient fine-tuning . In The Twelfth International Conference on Learning Representations

work page 2024
[66]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on EMNLP, pages 1631--1642

work page 2013
[67]

Pengwei Tang, Xiaolin Hu, and Yong Liu. 2025. https://openreview.net/forum?id=fswihJIYbd AD e PT : Adaptive decomposed prompt tuning for parameter-efficient fine-tuning . In The Thirteenth International Conference on Learning Representations

work page 2025
[68]

A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems

work page 2017
[69]

Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. 2020. Trl: Transformer reinforcement learning. https://github.com/huggingface/trl

work page 2020
[70]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32

work page 2019
[71]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461

work page internal anchor Pith review Pith/arXiv arXiv 2018
[72]

Smith, Iz Beltagy, and Hannaneh Hajishirzi

Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, and Hannaneh Hajishirzi. 2023 a . https://arxiv.org/abs/2306.04751 How far can camels go? exploring the state of instruction tuning on open resources . Preprint, arXiv:2306.04751

work page arXiv 2023
[73]

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, and Yoon Kim. 2023 b . https://openreview.net/forum?id=Nk2pDtuhTq Multitask prompt tuning enables parameter-efficient transfer learning . In The Eleventh International Conference on Learning Representations

work page 2023
[74]

Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. 2019. https://doi.org/10.1162/tacl_a_00290 Neural network acceptability judgments . Transactions of the ACL, 7:625--641

work page doi:10.1162/tacl_a_00290 2019
[75]

Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. https://doi.org/10.18653/v1/N18-1101 A broad-coverage challenge corpus for sentence understanding through inference . In Proceedings of the 2018 Conference of the North A merican Chapter of the ACL: Human Language Technologies, Volume 1 (Long Papers) , pages 1112--1122, New Orleans, Louisiana. ACL

work page doi:10.18653/v1/n18-1101 2018
[76]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, and 3 others. 2020. https://www.aclweb.org/anthology/2020.emnlp-demos.6 Transformers...

work page 2020
[77]

Lingling Xu, Haoran Xie, Si-Zhao Joe Qin, Xiaohui Tao, and Fu Lee Wang. 2023. Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148

work page arXiv 2023
[78]

Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, and 25 others. 2024. https://api.semanticscholar.org/CorpusID:274859421 Qwen2.5 technical report . ArXiv, abs/2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2024
[79]

Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th international conference on mining software repositories, pages 476--486

work page 2018
[80]

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. https://doi.org/10.18653/v1/P19-1472 H ella S wag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791--4800, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1472 2019

Showing first 80 references.