Recognition: 2 theorem links
· Lean TheoremPEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
Pith reviewed 2026-05-17 02:33 UTC · model grok-4.3
The pith
PEFT-Factory supplies one controlled environment that bundles 19 PEFT methods with 27 datasets for reproducible LLM fine-tuning comparisons.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PEFT-Factory is introduced as a unified framework originating from LLaMA-Factory that natively implements a representative set of 19 PEFT methods, supplies 27 datasets spanning 12 tasks, and includes both standard and PEFT-specific evaluation metrics, thereby creating a ready-to-use, controlled, and stable environment that improves replicability and benchmarking of PEFT methods.
What carries the argument
The modular design of PEFT-Factory that supports extensibility while delivering native implementations of 19 PEFT methods together with fixed datasets and metrics.
If this is right
- Newly proposed PEFT methods can be added and tested against the existing set without rebuilding the surrounding pipeline.
- Comparisons of method performance become possible under identical data splits, evaluation protocols, and hardware conditions.
- Researchers gain immediate access to both classification and text-generation benchmarks when evaluating a new technique.
- Custom PEFT variants can be inserted into the same controlled environment used for the built-in methods.
- Results reported from the framework carry consistent metrics that combine general and PEFT-specific measures.
Where Pith is reading between the lines
- The framework could serve as a shared reference point that later papers adopt to report results, reducing the current scatter of incompatible experimental setups.
- Extending the same modular structure to newer model families or additional task types would follow naturally from the design choices already made.
- If adoption grows, the collection of 27 datasets might evolve into a de-facto standard testbed for efficient fine-tuning research.
- Teams building production systems could use the same codebase to prototype and then deploy a chosen PEFT method with less translation effort.
Load-bearing premise
The native implementations of the 19 PEFT methods produce stable and comparable results across the 27 datasets without hidden code differences or extra tuning steps that would make fair comparison impossible.
What would settle it
Running the same PEFT method on the same dataset inside and outside PEFT-Factory and finding materially different performance numbers that trace to unaccounted implementation choices.
Figures
read the original abstract
Parameter-Efficient Fine-Tuning (PEFT) methods address the increasing size of Large Language Models (LLMs). Currently, many newly introduced PEFT methods are challenging to replicate, deploy, or compare with one another. To address this, we introduce PEFT-Factory, a unified framework for efficient fine-tuning LLMs using both off-the-shelf and custom PEFT methods. While its modular design supports extensibility, it natively provides a representative set of 19 PEFT methods, 27 classification and text generation datasets addressing 12 tasks, and both standard and PEFT-specific evaluation metrics. As a result, PEFT-Factory provides a ready-to-use, controlled, and stable environment, improving replicability and benchmarking of PEFT methods. PEFT-Factory is a downstream framework that originates from the popular LLaMA-Factory, and is publicly available at https://github.com/kinit-sk/PEFT-Factory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PEFT-Factory, a unified framework for parameter-efficient fine-tuning of autoregressive large language models. It natively implements 19 PEFT methods, supports 27 classification and text generation datasets across 12 tasks, and includes both standard and PEFT-specific evaluation metrics. The modular design enables extensibility for custom methods, and the framework is presented as a downstream extension of LLaMA-Factory that is publicly released to improve replicability and benchmarking.
Significance. If the native implementations faithfully reproduce published PEFT performance and the framework is actively maintained, it could provide a valuable standardized environment for the community, reducing the effort required for fair comparisons and replication studies in the fast-moving PEFT literature. The public GitHub release and stated support for both off-the-shelf and custom methods are concrete strengths that directly support the replicability goal.
major comments (2)
- [§4] The central claim that PEFT-Factory supplies a 'controlled and stable environment' for benchmarking rests on the correctness of the 19 native implementations, yet the manuscript contains no reproduction experiments, ablation studies, or direct comparisons against original published results for any of the bundled methods. This verification is load-bearing for the replicability assertion.
- [§3.2] The description of the modular architecture does not address how the framework ensures that custom or off-the-shelf PEFT methods produce results free of hidden implementation discrepancies when run across the 27 datasets; without such safeguards or tests, the benchmarking utility remains unproven.
minor comments (3)
- [Abstract] The abstract lists 'both standard and PEFT-specific evaluation metrics' but does not enumerate or define the PEFT-specific metrics; a short table or paragraph in §4 would improve clarity.
- A summary table listing the 19 PEFT methods, their core hyperparameters, and the tasks they support would help readers quickly assess coverage.
- [§2] The relationship between PEFT-Factory and the parent LLaMA-Factory codebase is mentioned but not detailed; explicit notes on which components were modified or extended would aid users who are already familiar with LLaMA-Factory.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address each major comment below and describe the revisions we will make to strengthen the replicability aspects of the manuscript.
read point-by-point responses
-
Referee: [§4] The central claim that PEFT-Factory supplies a 'controlled and stable environment' for benchmarking rests on the correctness of the 19 native implementations, yet the manuscript contains no reproduction experiments, ablation studies, or direct comparisons against original published results for any of the bundled methods. This verification is load-bearing for the replicability assertion.
Authors: We agree that direct verification of the native implementations would better support the claim of a controlled environment. The manuscript emphasizes the unified framework and its extensibility rather than exhaustive benchmarking, which we viewed as outside the primary scope. In the revision we will add a dedicated subsection (or appendix) presenting reproduction results for a representative subset of the 19 methods on standard datasets, comparing against numbers reported in the original PEFT papers. This will provide concrete evidence of implementation fidelity. revision: yes
-
Referee: [§3.2] The description of the modular architecture does not address how the framework ensures that custom or off-the-shelf PEFT methods produce results free of hidden implementation discrepancies when run across the 27 datasets; without such safeguards or tests, the benchmarking utility remains unproven.
Authors: We acknowledge that §3.2 could more explicitly describe the consistency mechanisms. All methods share the same data loaders, training loop, and evaluation harness; custom methods are required to implement a narrow interface that returns only the adapted parameters or logits. The repository already contains unit tests and example configuration files that exercise this interface across multiple datasets. We will revise the architecture description to highlight these safeguards and note that the evaluation pipeline is deliberately method-agnostic. revision: yes
Circularity Check
No significant circularity in engineering framework paper
full rationale
The manuscript introduces PEFT-Factory as a software framework providing native implementations of 19 PEFT methods and 27 datasets for improved replicability. No mathematical derivations, equations, predictions, or fitted parameters exist in the paper. The central claim rests on the public GitHub release and modular design, which are externally verifiable artifacts independent of any self-referential logic or self-citation chains. This is a standard engineering contribution whose correctness does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we introduce PEFT-FACTORY, a unified framework for efficient fine-tuning LLMs using both off-the-shelf and custom PEFT methods... natively provides a representative set of 19 PEFT methods, 27 classification and text generation datasets
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
implements a standardized PEFT interface... dynamic loading mechanism for custom PEFT methods
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, and James Zou. 2019. Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[4]
Lightning AI. 2023. Litgpt. https://github.com/Lightning-AI/litgpt
work page 2023
-
[5]
Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. 2019. https://doi.org/10.18653/v1/N19-1245 M ath QA : Towards interpretable math word problem solving with operation-based formalisms . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics:...
-
[6]
Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, Benjamin Schmidt, and Andriy Mulyar. 2023. Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. https://github.com/nomic-ai/gpt4all
work page 2023
-
[7]
Akari Asai, Mohammadreza Salehi, Matthew Peters, and Hannaneh Hajishirzi. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.446 ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655--6672, Abu Dhabi, United Arab Emirat...
-
[8]
Axolotl maintainers and contributors . 2023. https://github.com/axolotl-ai-cloud/axolotl Axolotl: Open source llm post-training
work page 2023
-
[9]
Roy Bar Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and Idan Szpektor. 2006. The second PASCAL recognising textual entailment challenge
work page 2006
-
[10]
Robert Belanec, Branislav Pecher, Ivan Srba, and Maria Bielikova. 2025. https://arxiv.org/abs/2511.21285 Peft-bench: A parameter-efficient fine-tuning methods benchmark . arXiv preprint
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. https://doi.org/10.18653/v1/2022.acl-short.1 B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1--9, Dublin, Ireland. Association...
-
[12]
Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge
work page 2009
-
[13]
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others. 2020. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432--7439
work page 2020
-
[14]
Arthur Cayley. 1846. Sur quelques propri \'e t \'e s des d \'e terminants gauches. Journal für die reine und angewandte Mathematik
-
[15]
Daniel Cer, Mona Diab, Eneko Agirre, I \ n igo Lopez-Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/v1/S17-2001 S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017) , pages 1--14, Vancouver, Canada. ACL
-
[16]
Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca
work page 2023
-
[17]
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1300 B ool Q : Exploring the surprising difficulty of natural yes/no questions . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language...
-
[18]
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, and 1 others. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[19]
Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. https://doi.org/10.1007/11736790_9 The pascal recognising textual entailment challenge . In Proceedings of the First International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, MLCW'05, page 177–190, Berlin...
-
[20]
Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. 2019. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, volume 23, pages 107--124
work page 2019
-
[21]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. Qlora: efficient finetuning of quantized llms. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS '23, Red Hook, NY, USA. Curran Associates Inc
work page 2023
- [22]
-
[23]
Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, and 1 others. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220--235
work page 2023
-
[24]
William B Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the International Workshop on Paraphrasing
work page 2005
-
[25]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, and 1 others. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, and Yu Qiao. 2023. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. 2007. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1--9. Association for Computational Linguistics
work page 2007
-
[28]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, and 1 others. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. 2024. https://openreview.net/forum?id=lIsCS8b6zj Parameter-efficient fine-tuning for large models: A comprehensive survey . Transactions on Machine Learning Research
work page 2024
-
[30]
Soufiane Hayou, Nikhil Ghosh, and Bin Yu. 2024. Lora+: efficient low rank adaptation of large models. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org
work page 2024
-
[31]
Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2022. https://openreview.net/forum?id=0RDcd5Axok Towards a unified view of parameter-efficient transfer learning . In International Conference on Learning Representations
work page 2022
-
[32]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. https://openreview.net/forum?id=d7KBjmI3GmQ Measuring massive multitask language understanding . In International Conference on Learning Representations
work page 2021
-
[33]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790--2799. PMLR
work page 2019
-
[34]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3
work page 2022
-
[35]
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[36]
Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), page...
work page 2018
-
[37]
Tushar Khot, Ashish Sabharwal, and Peter Clark. 2019. https://doi.org/10.18653/v1/D19-1281 What ' s missing: A knowledge gap guided approach for multi-hop question answering . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), p...
-
[38]
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.243 The power of scale for parameter-efficient prompt tuning . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045--3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics
-
[39]
Hector J Levesque, Ernest Davis, and Leora Morgenstern. 2011. The W inograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , volume 46, page 47
work page 2011
-
[40]
Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, and Yang You. 2023. https://doi.org/10.1145/3605573.3605613 Colossal-ai: A unified deep learning system for large-scale parallel training . In Proceedings of the 52nd International Conference on Parallel Processing, ICPP '23, page 766–775, New York, NY, USA. Ass...
-
[41]
Xiang Lisa Li and Percy Liang. 2021. https://doi.org/10.18653/v1/2021.acl-long.353 Prefix-tuning: Optimizing continuous prompts for generation . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582--4597, Onl...
- [42]
-
[43]
Chin-Yew Lin. 2004. https://aclanthology.org/W04-1013/ ROUGE : A package for automatic evaluation of summaries . In Text Summarization Branches Out, pages 74--81, Barcelona, Spain. Association for Computational Linguistics
work page 2004
-
[44]
Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, and sujay sanghavi. 2024. https://openreview.net/forum?id=DOUskwCqg5 SVFT : Parameter-efficient fine-tuning with singular vectors . In 2nd Workshop on Advancing Neural Network Training: Computational Ef...
work page 2024
-
[45]
Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A Raffel. 2022 a . Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950--1965
work page 2022
-
[46]
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. 2024. Dora: weight-decomposed low-rank adaptation. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org
work page 2024
-
[47]
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022 b . https://doi.org/10.18653/v1/2022.acl-short.8 P -tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61--68, Dub...
-
[48]
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2023. Gpt understands, too. AI Open
work page 2023
-
[49]
Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft
work page 2022
-
[50]
Fanxu Meng, Zhaohui Wang, and Muhan Zhang. 2024. Pissa: principal singular values and singular vectors adaptation of large language models. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS '24, Red Hook, NY, USA. Curran Associates Inc
work page 2024
-
[51]
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2024. Large language models: A survey. arXiv preprint arXiv:2402.06196
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[52]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. https://doi.org/10.3115/1073083.1073135 B leu: a method for automatic evaluation of machine translation . In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311--318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics
-
[53]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\" o pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, and 2 others. 2019. Pytorch: an imperative style, high-performance deep l...
work page 2019
-
[54]
Arkil Patel, Satwik Bhattamishra, and Navin Goyal. 2021. https://doi.org/10.18653/v1/2021.naacl-main.168 Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080--2094, Online. Association for ...
work page internal anchor Pith review doi:10.18653/v1/2021.naacl-main.168 2021
-
[55]
Jonas Pfeiffer, Ivan Vuli \'c , Iryna Gurevych, and Sebastian Ruder. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.617 MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654--7673, Online. Association for Comput...
-
[56]
Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. https://doi.org/10.18653/v1/N19-1128 W i C : the word-in-context dataset for evaluating context-sensitive meaning representations . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and S...
-
[57]
Clifton Poth, Hannah Sterz, Indraneil Paul, Sukannya Purkayastha, Leon Engl \"a nder, Timo Imhof, Ivan Vuli \'c , Sebastian Ruder, Iryna Gurevych, and Jonas Pfeiffer. 2023. https://aclanthology.org/2023.emnlp-demo.13 Adapters: A unified library for parameter-efficient and modular transfer learning . In Proceedings of the 2023 Conference on Empirical Metho...
work page 2023
-
[58]
Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, and Bernhard Sch \"o lkopf. 2023. Controlling text-to-image diffusion by orthogonal finetuning. Advances in Neural Information Processing Systems, 36:79320--79362
work page 2023
-
[59]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, and 1 others. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9
work page 2019
-
[60]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1--67
work page 2020
-
[61]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQ u AD : 100,000+ questions for machine comprehension of text. In Proceedings of EMNLP, pages 2383--2392. Association for Computational Linguistics
work page 2016
-
[62]
Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S Gordon. 2011. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In AAAI spring symposium: logical formalizations of commonsense reasoning, pages 90--95
work page 2011
-
[63]
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2021. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99--106
work page 2021
-
[64]
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. 2019. https://doi.org/10.18653/v1/D19-1454 Social IQ a: Commonsense reasoning about social interactions . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),...
-
[65]
Zhengxiang Shi and Aldo Lipani. 2024. https://openreview.net/forum?id=KjegfPGRde De PT : Decomposed prompt tuning for parameter-efficient fine-tuning . In The Twelfth International Conference on Learning Representations
work page 2024
-
[66]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on EMNLP, pages 1631--1642
work page 2013
-
[67]
Pengwei Tang, Xiaolin Hu, and Yong Liu. 2025. https://openreview.net/forum?id=fswihJIYbd AD e PT : Adaptive decomposed prompt tuning for parameter-efficient fine-tuning . In The Thirteenth International Conference on Learning Representations
work page 2025
-
[68]
A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems
work page 2017
-
[69]
Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. 2020. Trl: Transformer reinforcement learning. https://github.com/huggingface/trl
work page 2020
-
[70]
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32
work page 2019
-
[71]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[72]
Smith, Iz Beltagy, and Hannaneh Hajishirzi
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, and Hannaneh Hajishirzi. 2023 a . https://arxiv.org/abs/2306.04751 How far can camels go? exploring the state of instruction tuning on open resources . Preprint, arXiv:2306.04751
-
[73]
Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, and Yoon Kim. 2023 b . https://openreview.net/forum?id=Nk2pDtuhTq Multitask prompt tuning enables parameter-efficient transfer learning . In The Eleventh International Conference on Learning Representations
work page 2023
-
[74]
Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. 2019. https://doi.org/10.1162/tacl_a_00290 Neural network acceptability judgments . Transactions of the ACL, 7:625--641
-
[75]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. https://doi.org/10.18653/v1/N18-1101 A broad-coverage challenge corpus for sentence understanding through inference . In Proceedings of the 2018 Conference of the North A merican Chapter of the ACL: Human Language Technologies, Volume 1 (Long Papers) , pages 1112--1122, New Orleans, Louisiana. ACL
-
[76]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, and 3 others. 2020. https://www.aclweb.org/anthology/2020.emnlp-demos.6 Transformers...
work page 2020
- [77]
-
[78]
Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, and 25 others. 2024. https://api.semanticscholar.org/CorpusID:274859421 Qwen2.5 technical report . ArXiv, abs/2412.15115
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[79]
Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th international conference on mining software repositories, pages 476--486
work page 2018
-
[80]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. https://doi.org/10.18653/v1/P19-1472 H ella S wag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791--4800, Florence, Italy. Association for Computational Linguistics
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.