Pb-llm: Partially binarized large language models

URL https: //arxiv · 2023 · arXiv 2310.00034

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

cs.LG · 2026-06-11 · unverdicted · novelty 6.0

TWLA is a PTQ method using E2M-ATQ, KOTMS, and ILA-AMP to enable W1.58A4 quantization for LLMs with maintained accuracy.

SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

SAFE-SVD introduces a sensitivity-aware fidelity-enforcing SVD framework for compressing physics foundation models that maintains higher accuracy than standard methods at greater compression ratios.

LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

LBLLM achieves better accuracy than prior binarization methods for LLMs by decoupling weight and activation quantization through initialization, layer-wise distillation, and learnable activation scaling.

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

cs.CL · 2026-04-20 · unverdicted · novelty 6.0 · 2 refs

GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

cs.CL · 2025-08-09 · conditional · novelty 6.0

A progressive training scheme with binary-aware initialization and dual-scaling allows pre-trained LLMs to be converted to high-performance 1-bit models without training from scratch.

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

cs.AI · 2026-06-03 · unverdicted · novelty 5.0

SAGE-PTQ is a graph-guided ultra-low-bit PTQ framework that achieves 1.03 average weight bits and 0.004 scaling bits per matrix on LLMs while reporting lower perplexity and memory use than BiLLM and PB-LLM.

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

GAMMA is a post-training framework that learns stable module sensitivity rankings for mixed-precision LLM quantization and projects them to exact bit budgets via integer programming, enabling reuse across arbitrary memory targets.

BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design

cs.LG · 2026-04-05 · unverdicted · novelty 5.0

BWTA achieves near full-precision accuracy on BERT and LLMs using binary weights and ternary activations, with 16-24x kernel speedups via specialized CUDA kernels.

citing papers explorer

Showing 5 of 5 citing papers after filters.

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization cs.LG · 2026-06-11 · unverdicted · none · ref 19
TWLA is a PTQ method using E2M-ATQ, KOTMS, and ILA-AMP to enable W1.58A4 quantization for LLMs with maintained accuracy.
SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models cs.LG · 2026-05-18 · unverdicted · none · ref 20
SAFE-SVD introduces a sensitivity-aware fidelity-enforcing SVD framework for compressing physics foundation models that maintains higher accuracy than standard methods at greater compression ratios.
LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation cs.LG · 2026-04-21 · unverdicted · none · ref 47
LBLLM achieves better accuracy than prior binarization methods for LLMs by decoupling weight and activation quantization through initialization, layer-wise distillation, and learnable activation scaling.
GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets cs.LG · 2026-05-18 · unverdicted · none · ref 54
GAMMA is a post-training framework that learns stable module sensitivity rankings for mixed-precision LLM quantization and projects them to exact bit budgets via integer programming, enabling reuse across arbitrary memory targets.
BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design cs.LG · 2026-04-05 · unverdicted · none · ref 38
BWTA achieves near full-precision accuracy on BERT and LLMs using binary weights and ternary activations, with 16-24x kernel speedups via specialized CUDA kernels.

Pb-llm: Partially binarized large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer