Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

Chuanjian Liu; Hanting Chen; Jian Li; Jie Hu; Siqi Liu; Yuanyuan Xi; Yunhe Wang; Zhijun Tu

arxiv: 2508.06974 · v2 · pith:42K3IEQFnew · submitted 2025-08-09 · 💻 cs.CL

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

Zhijun Tu , Jian Li , Yuanyuan Xi , Siqi Liu , Chuanjian Liu , Hanting Chen , Jie Hu , Yunhe Wang This is my paper

Pith reviewed 2026-05-21 23:40 UTC · model grok-4.3

classification 💻 cs.CL

keywords 1-bit quantizationlarge language modelsprogressive trainingmodel compressionbinary weightspre-trained modelsquantization-aware trainingefficient inference

0 comments

The pith

Pre-trained large language models can be adapted into high-performance 1-bit versions using progressive training instead of training from scratch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that effective 1-bit quantized LLMs can be obtained by adapting existing pre-trained full-precision models rather than building binarized models from scratch. It addresses the accuracy loss and high costs of prior methods by using consistent progressive training to gradually close the gap between full-precision and 1-bit weights during both forward and backward passes. Supporting steps include binary-aware initialization and dual-scaling compensation to ease the conversion. If the approach holds, it would let practitioners turn current pre-trained models into storage-efficient and compute-light versions without repeating expensive initial training. Results across model sizes indicate better performance than existing 1-bit techniques.

Core claim

We identify that the large gap between full precision and 1-bit representations makes naive adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the full-precision weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.

What carries the argument

Consistent progressive training applied to both forward and backward passes, together with binary-aware initialization and dual-scaling compensation, that gradually converts full-precision weights to binary form.

If this is right

High-performance 1-bit LLMs become reachable directly from pre-trained checkpoints.
Training costs drop because full from-scratch binarized training is no longer required.
Accuracy degradation typical in 1-bit quantization is reduced across models of different sizes.
Storage and compute savings are realized while retaining competitive task performance.
Existing pre-trained assets can be reused for efficient quantized deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gradual-adaptation idea could be tested on other low-bit or mixed-precision schemes beyond strict 1-bit.
Practitioners might combine this conversion step with existing fine-tuning pipelines to produce task-specific 1-bit models more quickly.
If the progressive schedule proves robust, it may encourage re-examination of other large representation-gap problems in model compression.
Deployment on edge hardware could become more routine once pre-trained models are routinely turned into 1-bit versions.

Load-bearing premise

The large gap between full-precision and 1-bit representations can be bridged smoothly by consistent progressive training in forward and backward passes without irrecoverable accuracy loss.

What would settle it

If side-by-side tests on standard LLM benchmarks show that the resulting 1-bit models still lose substantial accuracy relative to full-precision versions or to from-scratch 1-bit baselines, the claim of successful smooth conversion would not hold.

Figures

Figures reproduced from arXiv: 2508.06974 by Chuanjian Liu, Hanting Chen, Jian Li, Jie Hu, Siqi Liu, Yuanyuan Xi, Yunhe Wang, Zhijun Tu.

**Figure 2.** Figure 2: Quantization error and initial loss. where L represents the loss function. In a linear layer, the floating-point multiplications are replaced by efficient bit-wise operations, and memory storage can be reduced by up to 16× compared to FP16 precision. This is particularly beneficial for reducing the inference cost of LLMs. Training large language models (LLMs) from scratch is highly resource-intensive, requ… view at source ↗

**Figure 3.** Figure 3: Comparison of different progressive training in binary neural networks. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Latency with different threads. initialization using the LLaMA-1B model. As shown in [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of initial perplexity and loss. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Different progressive scheduler on t. The top and bottom of each column represents the function of t and the corresponding progressive approximation curves. We also list the perplexity and zero-shot accuracy of final 1-bit models. [2] Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward … view at source ↗

**Figure 7.** Figure 7: Weights distribution of different training chunks. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training costs and notable accuracy degradation. We identify that the large gap between full precision and 1-bit representations makes naive adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the full-precision weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a workable recipe for turning pre-trained LLMs into 1-bit versions via progressive training plus init and scaling tweaks, but the evidence tying gains specifically to the progressive schedule is still thin.

read the letter

The main point is that high-performance 1-bit LLMs can be reached from existing pre-trained weights rather than training binarized models from scratch. The authors combine consistent progressive training across forward and backward passes with binary-aware initialization and dual-scaling compensation to handle the precision gap that usually causes big drops in naive adaptation. Experiments on LLMs of various sizes reportedly beat prior 1-bit methods, which is the concrete advance here. That combination of techniques looks like a new practical package even if it builds on earlier quantization work. If the numbers hold, it directly lowers the training cost barrier for deploying these models. The central claim rests on those empirical results showing the progressive schedule bridges the gap without irrecoverable loss. The soft spot is that the provided text does not include ablations or training curves that isolate the progressive component from the pre-trained starting point or the other two tricks. Without those controls it is hard to know how much credit belongs to the new schedule versus simply beginning from strong full-precision weights. Minor gaps like missing error bars or full hyperparameter details can be fixed in revision. This is aimed at people working on LLM quantization and efficient inference. A practitioner looking for a training procedure that starts from checkpoints will get usable ideas. The work shows clear engagement with the practical problem and the literature on quantization, so it merits a serious referee to verify the full experimental robustness and check whether the progressive training really delivers the claimed smoothness.

Referee Report

2 major / 1 minor

Summary. The paper claims that existing 1-bit LLM quantization methods require training from scratch and suffer from accuracy degradation due to the large gap between full-precision and binarized representations. It proposes consistent progressive training applied to both forward and backward passes, combined with binary-aware initialization and dual-scaling compensation, to smoothly convert pre-trained full-precision weights into 1-bit models. Experiments on LLMs of various sizes are reported to show outperformance over prior approaches, supporting the conclusion that high-performance 1-bit LLMs can be obtained from pre-trained checkpoints without expensive from-scratch training.

Significance. If the empirical results hold with proper controls, the work would be significant for reducing the training cost of 1-bit LLMs by reusing pre-trained models rather than starting from random initialization. This could lower barriers to deploying efficient quantized models. The approach is presented as an empirical training procedure rather than a parameter-free derivation or machine-checked proof.

major comments (2)

[Abstract and method description] The central claim that consistent progressive training (forward and backward) plus binary-aware init and dual-scaling bridges the representation gap without irrecoverable loss is load-bearing, yet the manuscript provides no ablation studies or intermediate training dynamics that isolate the progressive schedule's contribution. Without these, performance gains could be attributable to the pre-trained starting point alone rather than the proposed techniques.
[Abstract and experimental results] The abstract states outperformance on LLMs of various sizes, but the provided text contains no error bars, statistical significance tests, or full experimental details (e.g., training hyperparameters, dataset splits, or exact baselines). This undermines verification of the claim that the method eliminates the need for training from scratch.

minor comments (1)

[Method] Notation for the dual-scaling compensation and binary-aware initialization should be defined more explicitly with equations to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We have carefully reviewed the major comments and provide detailed point-by-point responses below. We agree that certain aspects of the presentation can be strengthened and outline the revisions we will make to address them.

read point-by-point responses

Referee: [Abstract and method description] The central claim that consistent progressive training (forward and backward) plus binary-aware init and dual-scaling bridges the representation gap without irrecoverable loss is load-bearing, yet the manuscript provides no ablation studies or intermediate training dynamics that isolate the progressive schedule's contribution. Without these, performance gains could be attributable to the pre-trained starting point alone rather than the proposed techniques.

Authors: We appreciate the referee's emphasis on the need for ablations to rigorously isolate the contribution of the consistent progressive training schedule. The manuscript presents the progressive training applied to both forward and backward passes, together with binary-aware initialization and dual-scaling compensation, as the key mechanisms for smoothly bridging the full-precision to 1-bit representation gap. While the current version focuses on the overall empirical outcomes, we acknowledge that dedicated ablation studies and intermediate training dynamics (such as loss or accuracy curves across training steps) are not explicitly included. In the revised manuscript, we will add these elements, including comparisons of the full proposed method against ablated variants that omit the progressive schedule, as well as plots illustrating training dynamics. This will help demonstrate that the performance improvements stem from the proposed techniques rather than the pre-trained initialization alone. revision: yes
Referee: [Abstract and experimental results] The abstract states outperformance on LLMs of various sizes, but the provided text contains no error bars, statistical significance tests, or full experimental details (e.g., training hyperparameters, dataset splits, or exact baselines). This undermines verification of the claim that the method eliminates the need for training from scratch.

Authors: We thank the referee for highlighting the importance of detailed experimental reporting to support the claims. The manuscript reports that our method outperforms existing approaches on LLMs of various sizes and concludes that high-performance 1-bit models can be obtained from pre-trained checkpoints without from-scratch training. However, we agree that the current text does not include error bars, statistical significance tests, or exhaustive details on training hyperparameters, dataset splits, and exact baseline configurations. In the revision, we will expand the experimental section to incorporate these: reporting means and standard deviations over multiple runs, including statistical significance tests where appropriate, and providing complete specifications of hyperparameters, dataset splits, and baseline implementations. These additions will improve the reproducibility and strengthen the evidence for our central claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training procedure validated on external benchmarks

full rationale

The paper presents an empirical method consisting of consistent progressive training in forward and backward passes, binary-aware initialization, and dual-scaling compensation to adapt pre-trained full-precision LLMs to 1-bit representations. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the method's own inputs or outputs. Performance claims rest on experimental results across LLMs of various sizes, which constitute external validation rather than internal tautologies or self-referential definitions. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that gradual conversion can overcome the representation gap; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption The large gap between full-precision and 1-bit representations makes naive adaptation difficult and can be addressed by consistent progressive training.
Stated directly in the abstract as the identified core difficulty.

pith-pipeline@v0.9.0 · 5682 in / 1278 out tokens · 32625 ms · 2026-05-21T23:40:35.185475+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we introduce a consistent progressive training for both forward and backward, smoothly converting the full-precision weights into the binarized ones... F(x, t) = tanh(tx)/tanh(t)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

binary-aware initialization... dual-scaling compensation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 14 internal anchors

[1]

Smollm - blazingly fast and remarkably powerful, 2024

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Leandro von Werra, and Thomas Wolf. Smollm - blazingly fast and remarkably powerful, 2024. 12 1 3 5 7 9 11 13 15 chunk 0 5 10 15 20 25 30 35The value of t Uniform Progressive 1 0 1 1.5 1.0 0.5 0.0 0.5 1.0 1.5 PPL=36.9, Zero-shot Acc.=40.5 1 3 5 7 9 11 13 15 chunk 0 5 10 15 20 25 30 35The value of t Logarithm ...

work page 2024
[2]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023

work page 2023
[3]

Piqa: Reasoning about physical common- sense in natural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical common- sense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439, 2020

work page 2020
[4]

Db-llm: Accurate dual-binarization for efficient llms

Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, et al. Db-llm: Accurate dual-binarization for efficient llms. arXiv preprint arXiv:2402.11960, 2024

work page arXiv 2024
[5]

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. ArXiv, abs/1905.10044, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[6]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Cbq: Cross-block quantization for large language models

Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, et al. Cbq: Cross-block quantization for large language models. arXiv preprint arXiv:2312.07950, 2023

work page arXiv 2023
[10]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

lm-evaluation-harness, 2021

EleutherAI. lm-evaluation-harness, 2021. Accessed: 2025-01-14

work page 2021
[12]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022. 13 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chunk 6 Chunk 7 Chunk 8 Chunk 9 Chunk 10 Chunk 11 Chunk 12 Chunk 13 Chunk 14 Chunk 15 Chunk 16 Chunk 17 Chunk 18 Chunk 19 Chun...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Pt-bitnet: 1-bit large language model with post-training quantization.Available at SSRN 4987078

Yufei Guo, Zecheng Hao, Jiahang Shao, Jie Zhou, Xiaode Liu, Xin Tong, Yuhan Zhang, Yuanpei Chen, Weihang Peng, and Zhe Ma. Pt-bitnet: 1-bit large language model with post-training quantization.Available at SSRN 4987078

work page
[14]

Billm: Pushing the limit of post-training quantization for llms

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xi- aojuan Qi. Billm: Pushing the limit of post-training quantization for llms. arXiv preprint arXiv:2402.04291, 2024

work page arXiv 2024
[15]

Loftq: Lora-fine-tuning-aware quantization for large language models

Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, and Tuo Zhao. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659, 2023

work page arXiv 2023
[16]

Arb-llm: Alternating refined binarizations for large language models

Zhiteng Li, Xianglong Yan, Tianao Zhang, Haotong Qin, Dong Xie, Jiang Tian, Linghe Kong, Yulun Zhang, Xiaokang Yang, et al. Arb-llm: Alternating refined binarizations for large language models. arXiv preprint arXiv:2410.03129, 2024

work page arXiv 2024
[17]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. Proceedings of Machine Learning and Systems, 6:87–100, 2024

work page 2024
[18]

Rotated binary neural network

Mingbao Lin, Rongrong Ji, Zihan Xu, Baochang Zhang, Yan Wang, Yongjian Wu, Feiyue Huang, and Chia-Wen Lin. Rotated binary neural network. Advances in neural information processing systems , 33:7474–7485, 2020

work page 2020
[19]

LLM-QAT: Data-free quantization aware training for large language models.arXiv preprint arXiv:2305.17888, 2023

Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, and Vikas Chandra. Llm-qat: Data-free quantization aware training for large language models. arXiv preprint arXiv:2305.17888, 2023. 14

work page arXiv 2023
[20]

Reactnet: Towards precise binary neural network with generalized activation functions

Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang-Ting Cheng. Reactnet: Towards precise binary neural network with generalized activation functions. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 143–159. Springer, 2020

work page 2020
[21]

Fbi-llm: Scaling up fully binarized llms from scratch via autoregressive distillation

Liqun Ma, Mingjie Sun, and Zhiqiang Shen. Fbi-llm: Scaling up fully binarized llms from scratch via autoregressive distillation. arXiv preprint arXiv:2407.07093, 2024

work page arXiv 2024
[22]

Bitnet b1

Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, and Furu Wei. Bitnet b1. 58 2b4t technical report. arXiv preprint arXiv:2504.12285, 2025

work page arXiv 2025
[23]

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits. arXiv preprint arXiv:2402.17764, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

On the State of the Art of Evaluation in Neural Language Models

Gábor Melis, Chris Dyer, and Phil Blunsom. On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[26]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InConference on Empirical Methods in Natural Language Processing, 2018

work page 2018
[27]

Fine-tuning llms to 1.58bit: extreme quantization made easy, 2024

Leandro von Werra Pedro Cuenca Omar Sanseviero Mohamed Mekkouri, Marc Sun and Thomas Wolf. Fine-tuning llms to 1.58bit: extreme quantization made easy, 2024

work page 2024
[28]

Low-bit quantization favors undertrained llms: Scaling laws for quantized llms with 100t training tokens

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, and Dong Yu. Low-bit quantization favors undertrained llms: Scaling laws for quantized llms with 100t training tokens. arXiv preprint arXiv:2411.17691, 2024

work page arXiv 2024
[29]

Forward and backward information retention for accurate binary neural networks

Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, and Jingkuan Song. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2250–2259, 2020

work page 2020
[30]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020

work page 2020
[31]

Xnor-net: Imagenet classifi- cation using binary convolutional neural networks

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classifi- cation using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016

work page 2016
[32]

Winogrande: An adversarial winograd schema challenge at scale

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106, 2021

work page 2021
[33]

Pb-llm: Partially binarized large language models

Yuzhang Shang, Zhihang Yuan, Qiang Wu, and Zhen Dong. Pb-llm: Partially binarized large language models. arXiv preprint arXiv:2310.00034, 2023

work page arXiv 2023
[34]

Omniquant: Omnidirectionally calibrated quantization for large language models

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, and Ping Luo. Omniquant: Omnidirectionally calibrated quantization for large language models. arXiv preprint arXiv:2308.13137, 2023

work page arXiv 2023
[35]

A survey on transformer compression

Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, and Dacheng Tao. A survey on transformer compression. arXiv preprint arXiv:2402.05964, 2024

work page arXiv 2024
[36]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Adabin: Improving binary neural networks with adaptive binary sets

Zhijun Tu, Xinghao Chen, Pengju Ren, and Yunhe Wang. Adabin: Improving binary neural networks with adaptive binary sets. In European conference on computer vision, pages 379–395. Springer, 2022

work page 2022
[38]

BitNet: Scaling 1-bit Transformers for Large Language Models

Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, and Furu Wei. Bitnet: Scaling 1-bit transformers for large language models. arXiv preprint arXiv:2310.11453, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Redpajama: an open dataset for training large language models

Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, et al. Redpajama: an open dataset for training large language models. arXiv preprint arXiv:2411.12372, 2024. 15

work page arXiv 2024
[40]

T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge, 2024

Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, and Mao Yang. T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge, 2024

work page 2024
[41]

Smoothquant: Accurate and efficient post-training quantization for large language models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning, pages 38087–38099. PMLR, 2023

work page 2023
[42]

Onebit: Towards extremely low-bit large language models

Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, and Wanxiang Che. Onebit: Towards extremely low-bit large language models. arXiv preprint arXiv:2402.11295, 2024

work page arXiv 2024
[43]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page 2025
[44]

Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics, 2019

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics, 2019

work page 2019
[45]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[46]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Be- ichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

An empirical study of qwen3 quantization

Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, and Xianglong Liu. An empirical study of qwen3 quantization. arXiv preprint arXiv:2505.02214, 2025

work page arXiv 2025
[48]

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017. 16

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Smollm - blazingly fast and remarkably powerful, 2024

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Leandro von Werra, and Thomas Wolf. Smollm - blazingly fast and remarkably powerful, 2024. 12 1 3 5 7 9 11 13 15 chunk 0 5 10 15 20 25 30 35The value of t Uniform Progressive 1 0 1 1.5 1.0 0.5 0.0 0.5 1.0 1.5 PPL=36.9, Zero-shot Acc.=40.5 1 3 5 7 9 11 13 15 chunk 0 5 10 15 20 25 30 35The value of t Logarithm ...

work page 2024

[2] [2]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023

work page 2023

[3] [3]

Piqa: Reasoning about physical common- sense in natural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical common- sense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439, 2020

work page 2020

[4] [4]

Db-llm: Accurate dual-binarization for efficient llms

Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, et al. Db-llm: Accurate dual-binarization for efficient llms. arXiv preprint arXiv:2402.11960, 2024

work page arXiv 2024

[5] [5]

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. ArXiv, abs/1905.10044, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[6] [6]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[8] [8]

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Cbq: Cross-block quantization for large language models

Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, et al. Cbq: Cross-block quantization for large language models. arXiv preprint arXiv:2312.07950, 2023

work page arXiv 2023

[10] [10]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

lm-evaluation-harness, 2021

EleutherAI. lm-evaluation-harness, 2021. Accessed: 2025-01-14

work page 2021

[12] [12]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022. 13 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chunk 6 Chunk 7 Chunk 8 Chunk 9 Chunk 10 Chunk 11 Chunk 12 Chunk 13 Chunk 14 Chunk 15 Chunk 16 Chunk 17 Chunk 18 Chunk 19 Chun...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[13] [13]

Pt-bitnet: 1-bit large language model with post-training quantization.Available at SSRN 4987078

Yufei Guo, Zecheng Hao, Jiahang Shao, Jie Zhou, Xiaode Liu, Xin Tong, Yuhan Zhang, Yuanpei Chen, Weihang Peng, and Zhe Ma. Pt-bitnet: 1-bit large language model with post-training quantization.Available at SSRN 4987078

work page

[14] [14]

Billm: Pushing the limit of post-training quantization for llms

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xi- aojuan Qi. Billm: Pushing the limit of post-training quantization for llms. arXiv preprint arXiv:2402.04291, 2024

work page arXiv 2024

[15] [15]

Loftq: Lora-fine-tuning-aware quantization for large language models

Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, and Tuo Zhao. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659, 2023

work page arXiv 2023

[16] [16]

Arb-llm: Alternating refined binarizations for large language models

Zhiteng Li, Xianglong Yan, Tianao Zhang, Haotong Qin, Dong Xie, Jiang Tian, Linghe Kong, Yulun Zhang, Xiaokang Yang, et al. Arb-llm: Alternating refined binarizations for large language models. arXiv preprint arXiv:2410.03129, 2024

work page arXiv 2024

[17] [17]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. Proceedings of Machine Learning and Systems, 6:87–100, 2024

work page 2024

[18] [18]

Rotated binary neural network

Mingbao Lin, Rongrong Ji, Zihan Xu, Baochang Zhang, Yan Wang, Yongjian Wu, Feiyue Huang, and Chia-Wen Lin. Rotated binary neural network. Advances in neural information processing systems , 33:7474–7485, 2020

work page 2020

[19] [19]

LLM-QAT: Data-free quantization aware training for large language models.arXiv preprint arXiv:2305.17888, 2023

Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, and Vikas Chandra. Llm-qat: Data-free quantization aware training for large language models. arXiv preprint arXiv:2305.17888, 2023. 14

work page arXiv 2023

[20] [20]

Reactnet: Towards precise binary neural network with generalized activation functions

Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang-Ting Cheng. Reactnet: Towards precise binary neural network with generalized activation functions. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 143–159. Springer, 2020

work page 2020

[21] [21]

Fbi-llm: Scaling up fully binarized llms from scratch via autoregressive distillation

Liqun Ma, Mingjie Sun, and Zhiqiang Shen. Fbi-llm: Scaling up fully binarized llms from scratch via autoregressive distillation. arXiv preprint arXiv:2407.07093, 2024

work page arXiv 2024

[22] [22]

Bitnet b1

Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, and Furu Wei. Bitnet b1. 58 2b4t technical report. arXiv preprint arXiv:2504.12285, 2025

work page arXiv 2025

[23] [23]

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits. arXiv preprint arXiv:2402.17764, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

On the State of the Art of Evaluation in Neural Language Models

Gábor Melis, Chris Dyer, and Phil Blunsom. On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[26] [26]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InConference on Empirical Methods in Natural Language Processing, 2018

work page 2018

[27] [27]

Fine-tuning llms to 1.58bit: extreme quantization made easy, 2024

Leandro von Werra Pedro Cuenca Omar Sanseviero Mohamed Mekkouri, Marc Sun and Thomas Wolf. Fine-tuning llms to 1.58bit: extreme quantization made easy, 2024

work page 2024

[28] [28]

Low-bit quantization favors undertrained llms: Scaling laws for quantized llms with 100t training tokens

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, and Dong Yu. Low-bit quantization favors undertrained llms: Scaling laws for quantized llms with 100t training tokens. arXiv preprint arXiv:2411.17691, 2024

work page arXiv 2024

[29] [29]

Forward and backward information retention for accurate binary neural networks

Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, and Jingkuan Song. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2250–2259, 2020

work page 2020

[30] [30]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020

work page 2020

[31] [31]

Xnor-net: Imagenet classifi- cation using binary convolutional neural networks

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classifi- cation using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016

work page 2016

[32] [32]

Winogrande: An adversarial winograd schema challenge at scale

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106, 2021

work page 2021

[33] [33]

Pb-llm: Partially binarized large language models

Yuzhang Shang, Zhihang Yuan, Qiang Wu, and Zhen Dong. Pb-llm: Partially binarized large language models. arXiv preprint arXiv:2310.00034, 2023

work page arXiv 2023

[34] [34]

Omniquant: Omnidirectionally calibrated quantization for large language models

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, and Ping Luo. Omniquant: Omnidirectionally calibrated quantization for large language models. arXiv preprint arXiv:2308.13137, 2023

work page arXiv 2023

[35] [35]

A survey on transformer compression

Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, and Dacheng Tao. A survey on transformer compression. arXiv preprint arXiv:2402.05964, 2024

work page arXiv 2024

[36] [36]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[37] [37]

Adabin: Improving binary neural networks with adaptive binary sets

Zhijun Tu, Xinghao Chen, Pengju Ren, and Yunhe Wang. Adabin: Improving binary neural networks with adaptive binary sets. In European conference on computer vision, pages 379–395. Springer, 2022

work page 2022

[38] [38]

BitNet: Scaling 1-bit Transformers for Large Language Models

Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, and Furu Wei. Bitnet: Scaling 1-bit transformers for large language models. arXiv preprint arXiv:2310.11453, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

Redpajama: an open dataset for training large language models

Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, et al. Redpajama: an open dataset for training large language models. arXiv preprint arXiv:2411.12372, 2024. 15

work page arXiv 2024

[40] [40]

T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge, 2024

Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, and Mao Yang. T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge, 2024

work page 2024

[41] [41]

Smoothquant: Accurate and efficient post-training quantization for large language models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning, pages 38087–38099. PMLR, 2023

work page 2023

[42] [42]

Onebit: Towards extremely low-bit large language models

Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, and Wanxiang Che. Onebit: Towards extremely low-bit large language models. arXiv preprint arXiv:2402.11295, 2024

work page arXiv 2024

[43] [43]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page 2025

[44] [44]

Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics, 2019

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics, 2019

work page 2019

[45] [45]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[46] [46]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Be- ichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

An empirical study of qwen3 quantization

Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, and Xianglong Liu. An empirical study of qwen3 quantization. arXiv preprint arXiv:2505.02214, 2025

work page arXiv 2025

[48] [48]

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017. 16

work page internal anchor Pith review Pith/arXiv arXiv 2017