arxiv: 2403.13372 · v4 · submitted 2024-03-20 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Yaowei Zheng , Richong Zhang , Junhao Zhang , Yanhan Ye , Zheyan Luo , Zhangchi Feng , Yongqiang Ma

Authors on Pith no claims yet

Pith reviewed 2026-05-11 12:35 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords fine-tuninglarge language modelsunified frameworkefficient trainingweb interfacelanguage modelingtext generationmodel adaptation

0 comments

The pith

A unified framework lets users fine-tune over 100 language models efficiently using only a web interface and no code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that combines multiple efficient fine-tuning techniques into one system for adapting large language models to new tasks. The system includes a web interface that lets users adjust settings for many different models without writing any code. Such a tool addresses the problem that each model usually needs its own implementation of training methods, which takes expert effort. Experiments confirm the framework works for basic language tasks and text generation.

Core claim

The framework integrates a range of efficient training methods to support the fine-tuning of more than one hundred language models in a flexible way. Customization happens entirely through the accompanying web user interface, removing any requirement for coding. Validation experiments on language modeling and text generation tasks establish both the efficiency and the effectiveness of this approach.

What carries the argument

The unified framework that merges efficient training methods with a web-based interface to manage fine-tuning across many models.

Load-bearing premise

That the efficient methods integrate without conflicts or performance drops when applied uniformly to many different language models.

What would settle it

A demonstration that fine-tuning performance or speed for some models falls below what direct per-model implementations achieve.

read the original abstract

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present LlamaFactory, a unified framework that integrates a suite of cutting-edge efficient training methods. It provides a solution for flexibly customizing the fine-tuning of 100+ LLMs without the need for coding through the built-in web UI LlamaBoard. We empirically validate the efficiency and effectiveness of our framework on language modeling and text generation tasks. It has been released at https://github.com/hiyouga/LLaMA-Factory and received over 25,000 stars and 3,000 forks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents LlamaFactory, a unified open-source framework integrating a suite of efficient fine-tuning methods for over 100 language models. It features a web-based UI (LlamaBoard) enabling no-code customization of fine-tuning workflows. The authors state that they empirically validate the framework's efficiency and effectiveness on language modeling and text generation tasks, and report its public release on GitHub with over 25,000 stars and 3,000 forks.

Significance. If the integration claims hold, the work provides a practical, accessible tool that reduces implementation barriers for efficient LLM adaptation across many architectures. The high GitHub adoption offers evidence of real-world utility and community value. The open release of the artifact itself constitutes a reproducible contribution that can support further research in NLP fine-tuning.

major comments (1)

Abstract: the claim of empirical validation on language modeling and text generation tasks is not accompanied by any metrics, baselines, or quantitative results. This detail is load-bearing for the effectiveness and efficiency assertions and should be expanded with concrete numbers and comparisons.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation. We address the single major comment below.

read point-by-point responses

Referee: [—] Abstract: the claim of empirical validation on language modeling and text generation tasks is not accompanied by any metrics, baselines, or quantitative results. This detail is load-bearing for the effectiveness and efficiency assertions and should be expanded with concrete numbers and comparisons.

Authors: We agree that the abstract would benefit from greater specificity to support the stated claims. The full manuscript (Section 4) contains the detailed experiments, including quantitative results on language modeling (e.g., perplexity) and text generation tasks with comparisons to baselines. We will revise the abstract to incorporate a concise summary of key metrics and efficiency gains, making the validation claims more concrete while preserving the abstract's brevity. revision: yes

Circularity Check

0 steps flagged

No circularity: framework release with external verifiability

full rationale

The paper presents LlamaFactory as an open-source software artifact integrating existing efficient fine-tuning methods (LoRA, QLoRA, etc.) for 100+ LLMs, with a no-code web UI. No mathematical derivations, fitted parameters, predictions, or uniqueness theorems are claimed. The central contribution is the released codebase (GitHub link provided, with reported stars/forks as external evidence of adoption). Empirical validation is described at a high level on standard tasks but does not involve any internal reduction to self-defined inputs or self-citations that bear the load of a derivation. The work is self-contained as an engineering deliverable whose functionality is directly testable outside the paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied software engineering paper; no free parameters are fitted to data, no additional axioms beyond standard computing are invoked, and no new scientific entities are postulated. The contribution rests on engineering integration and release.

pith-pipeline@v0.9.0 · 5427 in / 1146 out tokens · 89645 ms · 2026-05-11T12:35:12.761726+00:00 · methodology

discussion (0)

Forward citations

Cited by 51 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology
cs.AI 2026-03 conditional novelty 8.0

SARL rewards reasoning topology to improve label-free RL, outperforming baselines with gains up to 44.7% on math and 34.6% on open-ended tasks while maintaining more stable training.
GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction
cs.CY 2026-05 unverdicted novelty 7.0

A genome-conditioned 4B LLM agent predicts microbial life boundaries and matches larger frontier models via token fusion, tool use, and a counterfactual gene-grounding reward.
Teaching Language Models to Think in Code
cs.CL 2026-05 unverdicted novelty 7.0

ThinC trains small models to reason primarily in code rather than natural language, outperforming tool-integrated baselines and even larger models on competition math benchmarks.
DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams
cs.AI 2026-05 unverdicted novelty 7.0

DiagramNet supplies a new multimodal dataset and progressive training pipeline with decoupled multi-agent workflow, allowing a 3B model to outperform GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x on system-lev...
World2Minecraft: Occupancy-Driven Simulated Scenes Construction
cs.CV 2026-04 unverdicted novelty 7.0

World2Minecraft turns real scenes into Minecraft worlds via occupancy prediction and releases a large indoor occupancy dataset to improve such models.
BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering
cs.CL 2026-04 unverdicted novelty 7.0

BERAG applies Bayesian ensemble weighting of individual documents via token-by-token posterior updates in retrieval-augmented generation, yielding gains on knowledge-based visual QA tasks.
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
cs.CV 2026-04 unverdicted novelty 7.0

EmbodiedMidtrain mid-trains VLMs on curated VLA-aligned data subsets to improve downstream performance on robot manipulation benchmarks.
S-GRPO: Unified Post-Training for Large Vision-Language Models
cs.LG 2026-04 unverdicted novelty 7.0

S-GRPO unifies SFT and RL for LVLMs via conditional ground-truth injection that supplies a maximal-reward anchor when group exploration fails completely.
C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment
cs.CL 2026-04 unverdicted novelty 7.0

C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to syn...
RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience
cs.CR 2026-04 unverdicted novelty 7.0

RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
DeonticBench: A Benchmark for Reasoning over Rules
cs.CL 2026-04 unverdicted novelty 7.0

DEONTICBENCH is a new benchmark of 6,232 deontic reasoning tasks from U.S. legal domains where frontier LLMs reach only ~45% accuracy and symbolic Prolog assistance plus RL training still fail to solve tasks reliably.
ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs
cs.AR 2026-04 unverdicted novelty 7.0

ChatSVA achieves 96.12% functional pass rate and 82.5% coverage in SVA generation on 24 RTL designs, delivering 33 percentage point gains and 11x better coverage than prior state-of-the-art.
PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models
cs.CL 2026-03 unverdicted novelty 7.0

PR-CAD unifies text-to-CAD generation and editing via progressive refinement with LLMs, a new interaction dataset, and RL-enhanced reasoning to achieve better controllability and faithfulness.
Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling
cs.LG 2026-05 unverdicted novelty 6.0

Asynchronous I/O and Speculative Tool Calling cut latency in tool-calling LLM agents by 1.3-2.2x with only minor accuracy loss on cloud and edge models.
Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
cs.LG 2026-05 unverdicted novelty 6.0

For a fixed data budget in LLM supervised fine-tuning, optimal data difficulty shifts toward harder examples as the budget grows because of the tradeoff between in-distribution generalization gap and extrapolation gap.
Teaching Language Models to Think in Code
cs.CL 2026-05 unverdicted novelty 6.0

ThinC trains smaller language models to reason entirely in code after minimal NL planning, outperforming tool-integrated baselines and even much larger models on competition math benchmarks.
Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less
cs.LG 2026-05 unverdicted novelty 6.0

Full finetuning with the pretraining optimizer reduces forgetting compared to other optimizers or LoRA while achieving comparable new-task performance.
Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio
cs.LG 2026-05 unverdicted novelty 6.0

MoLS scales Adam updates using module-level SNR estimates to correct gradient noise imbalance and improve LLM training convergence and generalization.
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
cs.CL 2026-05 unverdicted novelty 6.0

DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting
cs.AI 2026-05 unverdicted novelty 6.0

ScrapMem introduces optical forgetting to compress multimodal memories for LLM agents on edge devices, cutting storage by up to 93% while reaching 51.0% Joint@10 and 70.3% Recall@10 on ATM-Bench.
Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM
cs.CL 2026-05 unverdicted novelty 6.0

A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalizati...
When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation
cs.SE 2026-04 unverdicted novelty 6.0

EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% ove...
SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring
cs.CV 2026-04 unverdicted novelty 6.0

SIEVES improves selective prediction coverage up to 3x on OOD VQA benchmarks by training a selector on visual localization quality, generalizing across datasets and proprietary reasoners without specific adaptation.
Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA
cs.IR 2026-04 unverdicted novelty 6.0

Rabtriever distills a generative reranker into an efficient independent encoder using JEPA and auxiliary reverse KL loss to achieve linear complexity and strong performance on rationale-based retrieval tasks.
Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA
cs.IR 2026-04 unverdicted novelty 6.0

Rabtriever distills a generative reranker into an efficient bi-encoder using on-policy JEPA to achieve near-reranker accuracy with linear complexity on rationale-based retrieval.
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation
cs.AI 2026-04 unverdicted novelty 6.0

CoDA aligns cross-domain latent reasoning representations in LLMs via CoT distillation and MMD to enable effective knowledge transfer without in-domain demonstrations.
CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora
cs.SE 2026-04 unverdicted novelty 6.0

CodePivot uses Python as a pivot language plus an Aggressive-Partial-Functional RL reward to train a 7B model that outperforms much larger LLMs on multilingual code transpilation without parallel corpora.
Characterizing Model-Native Skills
cs.AI 2026-04 conditional novelty 6.0

Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming...
Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding
cs.CV 2026-04 unverdicted novelty 6.0

Chain-of-Glimpse is a reinforcement learning framework that builds progressive, spatially grounded reasoning traces around task-relevant objects in videos to enable more accurate and interpretable multi-step decisions.
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
cs.LG 2026-04 unverdicted novelty 6.0

On-policy distillation works when student and teacher models share thinking patterns and the teacher adds new capabilities, with success tied to alignment on a small set of high-probability tokens.
Pioneer Agent: Continual Improvement of Small Language Models in Production
cs.AI 2026-04 unverdicted novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on ...
Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
cs.CV 2026-04 unverdicted novelty 6.0

Fundus-R1 is a fundus-reading MLLM trained exclusively on public data via RAG-generated reasoning traces and process-reward RLVR, outperforming its base model and a version trained without the traces.
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
cs.CV 2026-04 unverdicted novelty 6.0

SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward
cs.CV 2026-04 unverdicted novelty 6.0

Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.
EgoMind: Activating Spatial Cognition through Linguistic Reasoning in MLLMs
cs.CV 2026-04 unverdicted novelty 6.0

EgoMind activates spatial cognition in MLLMs via linguistic Role-Play Caption and Progressive Spatial Analysis, reaching competitive results on VSI-Bench, SPAR-Bench, SITE-Bench and SPBench with only 5K SFT and 20K RL...
GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum
cs.CL 2026-03 unverdicted novelty 6.0

GraphWalker achieves state-of-the-art results on CWQ and WebQSP by training KGQA agents via synthetic random-walk trajectories in stage-wise SFT plus RL, with improved out-of-distribution generalization.
UserGPT Technical Report
cs.IR 2026-05 unverdicted novelty 5.0

UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while...
LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
cs.CV 2026-05 unverdicted novelty 5.0

LensVLM trains VLMs to scan compressed rendered text images and selectively expand task-relevant regions, achieving 4.3x compression with near full-text accuracy and outperforming baselines up to 10.1x on text QA benchmarks.
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory
cs.AI 2026-05 unverdicted novelty 5.0

HyperLens reveals that deeper transformer layers magnify small confidence changes into fine-grained trajectories, allowing quantification of cognitive effort where complex tasks demand more and standard SFT can reduce it.
SAM-NER: Semantic Archetype Mediation for Zero-Shot Named Entity Recognition
cs.CL 2026-05 unverdicted novelty 5.0

SAM-NER improves cross-domain zero-shot NER by discovering entities, projecting them into domain-invariant semantic archetypes, and then calibrating those archetypes to target labels with a frozen LLM.
Perceptual Flow Network for Visually Grounded Reasoning
cs.CV 2026-05 unverdicted novelty 5.0

PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents
cs.AI 2026-04 unverdicted novelty 5.0

AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.
Environmental Understanding Vision-Language Model for Embodied Agent
cs.CV 2026-04 unverdicted novelty 5.0

EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.
ProUIE: A Macro-to-Micro Progressive Learning Method for LLM-based Universal Information Extraction
cs.CL 2026-04 unverdicted novelty 5.0

ProUIE uses macro-level complete modeling, meso-level streamlined alignment, and micro-level deep exploration with GRPO and stepwise rewards to improve LLM universal information extraction on 36 datasets without added...
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
cs.LG 2026-04 unverdicted novelty 5.0

Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.
An End-to-End Framework for Building Large Language Models for Software Operations
cs.LG 2026-04 unverdicted novelty 5.0

OpsLLM outperforms general LLMs on software operations QA and RCA tasks through human-in-the-loop data curation, supervised fine-tuning, and domain-specific reinforcement learning.
RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
cs.CV 2026-05 unverdicted novelty 4.0

RCoT-Seg uses GRPO-reinforced keyframe selection from a CoT-start corpus followed by SAM2 mask propagation to improve video object segmentation under implicit temporal instructions over prior MLLM sampling methods.
DAT: Dual-Aware Adaptive Transmission for Efficient Multimodal LLM Inference in Edge-Cloud Systems
cs.MM 2026-04 unverdicted novelty 4.0

DAT combines a small-large model cascade with fine-tuning and bandwidth-aware multi-stream transmission to deliver high-accuracy event recognition and low-latency alerts for video streams in edge-cloud systems.
An End-to-End Framework for Building Large Language Models for Software Operations
cs.LG 2026-04 unverdicted novelty 4.0

OpsLLM is a domain-specific LLM for software ops QA and RCA built with human-curated data, SFT, and RL using a domain process reward model, showing accuracy gains of 0.2-5.7% on QA and 2.7-70.3% on RCA over general LLMs.
Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models
cs.CV 2026-04 unverdicted novelty 3.0

Native multimodal Qwen models outperform structured vision-language pipelines on the CDVQA benchmark for change VQA in remote sensing, with performance not scaling monotonically with model size.
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains
cs.AI 2026-04 unverdicted novelty 3.0

Flowr is an agentic AI framework that decomposes retail supply chain workflows into coordinated LLM-based agents with human-in-the-loop oversight to automate operations in large supermarket chains.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 48 Pith papers · 5 internal anchors

[1]

arXiv preprint arXiv:2402.11746

Language models are homer simpson! safety re-alignment of fine-tuned language models through task arithmetic. arXiv preprint arXiv:2402.11746. Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, et al. 2024. DeepSeek LLM: Scal- ing open-source language models with longtermism. arXiv prepri...

work page arXiv 2024
[2]

Extreme compression of large language models via additive quantization,

Extreme compression of large language models via additive quantization. arXiv preprint arXiv:2401.06118. Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. 2024. KTO: Model alignment as prospect theoretic optimization. In In- ternational Conference on Machine Learning , Vi- enna, Austria. PMLR. Elias Frantar, Saleh Ashkboos, T...

work page arXiv 2024
[3]

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 11170–11189, 2024

Measuring massive multitask language under- standing. In International Conference on Learning Representations. Jiwoo Hong, Noah Lee, and James Thorne. 2024. ORPO: Monolithic preference optimization without reference model. arXiv preprint arXiv:2403.07691. Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Ge...

work page arXiv 2024
[4]

Pissa: Principal singular values and singular vectors adaptation of large language models, 2025

PEFT: State-of-the-art parameter-efficient fine- tuning methods. Fanxu Meng, Zhaohui Wang, and Muhan Zhang. 2024. PiSSA: Principal singular values and singular vectors adaptation of large language models. arXiv preprint arXiv:2404.02948. Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir...

work page arXiv 2024
[5]

In Proceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807, Brussels, Bel- gium

Don’t give me the details, just the summary! topic-aware convolutional neural networks for ex- treme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807, Brussels, Bel- gium. Association for Computational Linguistics. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwrigh...

work page 2018
[6]

Advances in Neural Information Processing Systems, 35:27730–27744

Training language models to follow instruc- tions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744. 10 Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. BLEU: a method for automatic eval- uation of machine translation. In Proceedings of the 40th annual meeting of the Association for Compu- tational L...

work page 2002
[7]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Direct preference optimization: Your language model is secretly a reward model. Advances in Neu- ral Information Processing Systems, 37. Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. 2020. DeepSpeed: System optimiza- tions enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD Inter...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[8]

LLaMA: Open and Efficient Foundation Language Models

Triton: An intermediate language and com- piler for tiled neural network computations. In Pro- ceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pages 10–19. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, F...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

arXiv preprint arXiv:2401.03804

Telechat technical report. arXiv preprint arXiv:2401.03804. Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2022. Finetuned language mod- els are zero-shot learners. In International Confer- ence on Learning Representations. Tianwen Wei, Liang Zhao, Lichang Zhang, Bo Zhu, Lijie Wang, Hai...

work page arXiv 2022
[10]

Self-Rewarding Language Models

Self-rewarding language models. arXiv preprint arXiv:2401.10020. Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, et al

work page internal anchor Pith review arXiv
[11]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

ChatGLM: A family of large language models from GLM-130b to GLM-4 all tools. arXiv preprint arXiv:2406.12793. Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, and Yu Qiao. 2024. LLaMA-adapter: Efficient fine- tuning of language models with zero-init attention. In International Conference on Learning Represent...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

A Survey of Large Language Models

GaLore: Memory-efficient llm training by gra- dient low-rank projection. In International Confer- ence on Machine Learning , Vienna, Austria. PMLR. Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifa...

work page internal anchor Pith review Pith/arXiv arXiv 2023