An Empirical Study of OpenPangu Quantization on Ascend NPUs

Aishan Liu; Hui Xie; Jiacheng Wang; Jinyang Guo; Tong Shi; Xianglong Liu; Ying Li

arxiv: 2606.21257 · v2 · pith:X6NLZEALnew · submitted 2026-06-19 · 💻 cs.LG · cs.AI

An Empirical Study of OpenPangu Quantization on Ascend NPUs

Tong Shi , Jiacheng Wang , Hui Xie , Ying Li , Aishan Liu , Jinyang Guo , Xianglong Liu This is my paper

Pith reviewed 2026-06-29 04:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords post-training quantizationOpenPanguAscend NPUlarge language modelsmodel compressionweight quantizationempirical evaluation

0 comments

The pith

8-bit weight-only quantization preserves OpenPangu model performance on Ascend NPUs across 18 tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how well OpenPangu 1B and 7B language models hold up when quantized for efficient running on Ascend NPUs. It applies several standard quantization techniques under the same conditions and measures results on a broad set of 18 tasks. The key finding is that reducing weights to 8 bits causes no noticeable drop in accuracy for either model size. Four-bit reduction works reasonably for the larger model but causes clear drops in the smaller one on harder tasks like math and reasoning. These results give practical guidance on which precision levels to choose when deploying these models on the target hardware.

Core claim

Under a controlled protocol on Ascend 910B1 NPUs, 8-bit weight-only quantization proves effectively lossless for both the 1B and 7B OpenPangu models across 18 tasks, whereas 4-bit quantization stays practical for the 7B model yet visibly degrades the 1B model on reasoning, math, and code tasks; most 2-bit and binary configurations collapse toward random performance and certain 4-bit activation settings yield non-finite perplexity.

What carries the argument

A unified calibration and evaluation protocol comparing weight-only methods (RTN, GPTQ, AWQ, BiLLM, SliM-LLM) and weight-activation methods (SmoothQuant, GPTAQ) on the Ascend NPU hardware.

If this is right

8-bit weight-only quantization can be deployed for both models without accuracy loss on the tested tasks.
4-bit weight quantization offers a viable option for the 7B model but demands careful task evaluation for the 1B model.
Extreme low-bit settings such as 2-bit and binary generally fail to maintain usable performance.
W4A4 configurations with SmoothQuant can produce invalid perplexity scores in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These patterns may generalize to other similar-sized models running on Ascend hardware, suggesting hardware-specific quantization tuning.
Task selection in future studies could include more diverse real-world applications to strengthen the map of usable precisions.
Combining these quantization findings with NPU-specific kernel optimizations might further close any remaining gaps at 4 bits.

Load-bearing premise

The 18 evaluation tasks together with the single unified calibration protocol represent typical real-world deployment conditions for these models.

What would settle it

A new test showing that the same 4-bit quantized 1B model achieves near-original scores on a different but comparable set of reasoning and math benchmarks would contradict the harm claim.

Figures

Figures reproduced from arXiv: 2606.21257 by Aishan Liu, Hui Xie, Jiacheng Wang, Jinyang Guo, Tong Shi, Xianglong Liu, Ying Li.

**Figure 2.** Figure 2: Zero-shot commonsense reasoning accuracy averaged over PIQA, ARC-Easy, ARC [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Weight reconstruction SQNR comparison between per-channel and per-group symmetric [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

OpenPangu models are attractive targets for private and domestic large-language-model deployment, yet their robustness under aggressive post-training quantization on Ascend NPUs has not been systematically characterized. This paper conducts a controlled empirical study of OpenPangu 1B and 7B models on Huawei Ascend 910B1 NPUs. We evaluate representative weight-only and weight-activation post-training quantization methods, including RTN, GPTQ, AWQ, SmoothQuant, GPTAQ, BiLLM, and SliM-LLM, under a unified calibration and evaluation protocol. Across 18 evaluation tasks, we find that 8-bit weight-only quantization is effectively lossless for both models, while 4-bit quantization remains practical for the 7B model but is visibly more harmful for the 1B model on reasoning, math, and code tasks. Ultra-low precision remains challenging: most 2-bit and binary settings collapse to near-random behavior, and W4A4 SmoothQuant produces non-finite perplexity in our evaluation. These results provide an NPU-oriented accuracy map for selecting OpenPangu quantization settings and highlight the persistent difficulty of extreme low-bit compression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a narrow empirical report applying known quantization methods to OpenPangu 1B/7B on Ascend NPUs, with no algorithmic or theoretical advance.

read the letter

The main takeaway is that this paper measures how standard post-training quantization behaves on two specific OpenPangu models when run on Huawei Ascend 910B1 NPUs. It applies RTN, GPTQ, AWQ, SmoothQuant, GPTAQ, BiLLM, and SliM-LLM under one calibration protocol and reports results across 18 tasks.

What it does reasonably is give practitioners a direct comparison on this hardware. The central observation is that 8-bit weight-only stays close to full precision for both model sizes, while 4-bit stays usable on the 7B model but hurts the 1B model more on reasoning, math, and code. The 2-bit and binary cases mostly collapse, which matches what people have seen elsewhere. Having all methods tested the same way on the same NPU is the useful part if you actually need to pick settings for OpenPangu deployments.

The limitations are straightforward and fairly large relative to the contribution. Nothing in the methods is new; these are the same algorithms already tested on other models and GPUs. The headline claims about 8-bit being effectively lossless and 4-bit being practical rest on the 18 tasks being representative and the calibration protocol not favoring any particular method. The abstract supplies no information on task selection criteria, calibration set size, or overlap with evaluation data, so it is impossible to judge whether those thresholds would hold under different benchmarks. No error bars or significance tests are mentioned either, which leaves the size of the gaps between methods unclear.

This work is mainly for engineers who are already committed to OpenPangu and Ascend hardware and want a quick accuracy map before choosing bit widths. It has little to offer readers interested in general quantization progress or new hardware platforms. I would not send it for peer review; the scope is too limited and the advance too incremental to justify referee effort at a standard venue.

Referee Report

2 major / 0 minor

Summary. The paper conducts a controlled empirical study of post-training quantization (RTN, GPTQ, AWQ, SmoothQuant, GPTAQ, BiLLM, SliM-LLM) on OpenPangu 1B and 7B models using Huawei Ascend 910B1 NPUs under a single unified calibration and evaluation protocol. Across 18 tasks it reports that 8-bit weight-only quantization is effectively lossless for both models, 4-bit remains practical for the 7B model but visibly harms the 1B model on reasoning/math/code tasks, while 2-bit/binary settings collapse and W4A4 SmoothQuant yields non-finite perplexity. The work supplies an NPU-oriented accuracy map for selecting quantization settings.

Significance. If the empirical map holds under the stated protocol, the study supplies immediately usable guidance for domestic Ascend-NPU deployment of OpenPangu models and fills a hardware-specific gap in the quantization literature. The controlled, multi-method comparison is a clear strength.

major comments (2)

[Abstract] Abstract / evaluation protocol description: the central claims that 8-bit weight-only quantization is 'effectively lossless' and 4-bit is 'practical' for the 7B model rest entirely on the representativeness of the 18 tasks and the unbiasedness of the single unified calibration protocol. No information is supplied on task-selection criteria, calibration-set size or composition, overlap with evaluation data, or statistical significance of observed differences; this assumption is load-bearing for every ranking and threshold reported.
[Results] Results section (task-wise tables): without reported standard deviations across multiple random seeds, calibration-set ablations, or explicit checks for calibration/evaluation leakage, the distinction between 'practical' performance on the 7B model and 'visibly more harmful' performance on the 1B model for reasoning/math/code tasks cannot be assessed for robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater transparency in our evaluation protocol and robustness analysis. We will revise the manuscript to address these points directly while preserving the core empirical findings on OpenPangu quantization under the unified Ascend NPU protocol.

read point-by-point responses

Referee: [Abstract] Abstract / evaluation protocol description: the central claims that 8-bit weight-only quantization is 'effectively lossless' and 4-bit is 'practical' for the 7B model rest entirely on the representativeness of the 18 tasks and the unbiasedness of the single unified calibration protocol. No information is supplied on task-selection criteria, calibration-set size or composition, overlap with evaluation data, or statistical significance of observed differences; this assumption is load-bearing for every ranking and threshold reported.

Authors: We agree that explicit documentation of these elements is necessary to support the claims. In the revised manuscript we will add a dedicated 'Evaluation Protocol' subsection detailing: task-selection criteria (standard Chinese/English LLM benchmarks spanning understanding, reasoning, math, and code to reflect deployment use cases); calibration-set size, composition, and source (128 samples drawn from a public corpus with no overlap to any evaluation split); and a statement that all observed differences are consistent in direction across the 18 tasks. We will also note that formal statistical significance testing was not performed owing to the computational cost of repeated NPU runs, but the patterns hold uniformly across task categories. revision: yes
Referee: [Results] Results section (task-wise tables): without reported standard deviations across multiple random seeds, calibration-set ablations, or explicit checks for calibration/evaluation leakage, the distinction between 'practical' performance on the 7B model and 'visibly more harmful' performance on the 1B model for reasoning/math/code tasks cannot be assessed for robustness.

Authors: The referee is correct that variability measures would improve interpretability. Because each configuration required substantial Ascend 910B1 runtime, the original study used single runs per setting. In revision we will (1) add an explicit statement confirming no calibration/evaluation leakage, (2) include a limitations paragraph acknowledging the absence of multi-seed standard deviations, and (3) report a small-scale ablation on two representative tasks using three different calibration seeds to illustrate stability of the reported trends. The 1B vs. 7B distinction remains supported by the consistent category-level pattern rather than isolated metrics. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical measurements with no derivations or fitted predictions

full rationale

The paper reports controlled empirical evaluations of post-training quantization methods (RTN, GPTQ, AWQ, etc.) on OpenPangu 1B/7B models across 18 tasks under a unified protocol. No equations, derivations, parameter fits, or predictions are claimed; results are direct accuracy/perplexity measurements. No self-citation chains or ansatzes are load-bearing. The central claims rest on external benchmarks and task outcomes, which are falsifiable outside the paper's own fitted values. This matches the default expectation of a non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmarking paper; central claims rest on measurement protocol and task selection rather than mathematical axioms or new postulated entities.

pith-pipeline@v0.9.1-grok · 5752 in / 973 out tokens · 31873 ms · 2026-06-29T04:38:17.517895+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Advances in Neural Information Processing Systems 30(2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All You Need. Advances in Neural Information Processing Systems 30(2017)

2017
[2]

arXiv preprint arXiv:2409.16694 (2024)

Gong, R., Ding, Y ., Wang, Z., Lv, C., Zheng, X., Du, J., Qin, H., Guo, J., Magno, M., Liu, X.: A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms. arXiv preprint arXiv:2409.16694 (2024)

work page arXiv 2024
[3]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Frantar, E., Ashkboos, S., Hoefler, T., Alistarh, D.: GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv preprint arXiv:2210.17323 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

Proceedings of Machine Learning and Systems6(2024)

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.M., Wang, W.C., Xiao, G., Dang, X., Gan, C., Han, S.: AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration. Proceedings of Machine Learning and Systems6(2024)

2024
[5]

In: Proceedings of the 40th International Conference on Machine Learning

Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., Han, S.: SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In: Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 38087–38099. PMLR (2023)

2023
[6]

In: Proceedings of the 41st International Conference on Machine Learning

Huang, W., Liu, Y ., Qin, H., Li, Y ., Zhang, S., Liu, X., Magno, M., Qi, X.: BiLLM: Pushing the Limit of Post-Training Quantization for LLMs. In: Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 235, pp. 20023–20042. PMLR (2024)

2024
[7]

In: Proceedings of the 42nd International Conference on Machine Learning

Huang, W., Qin, H., Liu, Y ., Li, Y ., Liu, Q., Liu, X., Benini, L., Magno, M., Zhang, S., Qi, X.: SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267, pp. 25672–25692. PMLR (2025)

2025
[8]

In: Proceedings of the 42nd International Conference on Machine Learning

Li, Y ., Yin, R., Lee, D., Xiao, S., Panda, P.: GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267, pp. 36690–36706. PMLR (2025)

2025
[9]

In: Interna- tional Conference on Learning Representations (2017)

Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer Sentinel Mixture Models. In: Interna- tional Conference on Learning Representations (2017)

2017
[10]

Journal of Machine Learning Research21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research21(140), 1–67 (2020)

2020
[11]

Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 7432–7439 (2020)

Bisk, Y ., Zellers, R., Le Bras, R., Gao, J., Choi, Y .: PIQA: Reasoning about Physical Com- monsense in Natural Language. Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 7432–7439 (2020)

2020
[12]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think You Have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. arXiv preprint arXiv:1803.05457 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Zellers, R., Holtzman, A., Bisk, Y ., Farhadi, A., Choi, Y .: HellaSwag: Can a Machine Really Finish Your Sentence? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 4791–4800. Association for Computational Linguistics (2019)

2019
[14]

Communications of the ACM64(9), 99–106 (2021)

Sakaguchi, K., Le Bras, R., Bhagavatula, C., Choi, Y .: WinoGrande: An Adversarial Winograd Schema Challenge at Scale. Communications of the ACM64(9), 99–106 (2021)

2021
[15]

In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Clark, C., Lee, K., Chang, M.W., Kwiatkowski, T., Collins, M., Toutanova, K.: BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2924–2936. Association for Computational Linguistics (2019) 12

2019
[16]

In: International Conference on Learning Representations (2021)

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.: Mea- suring Massive Multitask Language Understanding. In: International Conference on Learning Representations (2021)

2021
[17]

arXiv preprint arXiv:2304.01089 (2023)

Yuan, Z., Niu, L., Liu, J., Liu, W., Wang, X., Shang, Y ., Sun, G., Wu, Q., Wu, J., Wu, B.: RPTQ: Reorder-based Post-training Quantization for Large Language Models. arXiv preprint arXiv:2304.01089 (2023)

work page arXiv 2023
[18]

In: International Conference on Learning Representations (2025) 13

Liu, Z., Zhao, C., Fedorov, I., Soran, B., Choudhary, D., Krishnamoorthi, R., Chandra, V ., Tian, Y ., Blankevoort, T.: SpinQuant: LLM Quantization with Learned Rotations. In: International Conference on Learning Representations (2025) 13

2025

[1] [1]

Advances in Neural Information Processing Systems 30(2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All You Need. Advances in Neural Information Processing Systems 30(2017)

2017

[2] [2]

arXiv preprint arXiv:2409.16694 (2024)

Gong, R., Ding, Y ., Wang, Z., Lv, C., Zheng, X., Du, J., Qin, H., Guo, J., Magno, M., Liu, X.: A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms. arXiv preprint arXiv:2409.16694 (2024)

work page arXiv 2024

[3] [3]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Frantar, E., Ashkboos, S., Hoefler, T., Alistarh, D.: GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv preprint arXiv:2210.17323 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[4] [4]

Proceedings of Machine Learning and Systems6(2024)

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.M., Wang, W.C., Xiao, G., Dang, X., Gan, C., Han, S.: AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration. Proceedings of Machine Learning and Systems6(2024)

2024

[5] [5]

In: Proceedings of the 40th International Conference on Machine Learning

Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., Han, S.: SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In: Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 38087–38099. PMLR (2023)

2023

[6] [6]

In: Proceedings of the 41st International Conference on Machine Learning

Huang, W., Liu, Y ., Qin, H., Li, Y ., Zhang, S., Liu, X., Magno, M., Qi, X.: BiLLM: Pushing the Limit of Post-Training Quantization for LLMs. In: Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 235, pp. 20023–20042. PMLR (2024)

2024

[7] [7]

In: Proceedings of the 42nd International Conference on Machine Learning

Huang, W., Qin, H., Liu, Y ., Li, Y ., Liu, Q., Liu, X., Benini, L., Magno, M., Zhang, S., Qi, X.: SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267, pp. 25672–25692. PMLR (2025)

2025

[8] [8]

In: Proceedings of the 42nd International Conference on Machine Learning

Li, Y ., Yin, R., Lee, D., Xiao, S., Panda, P.: GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267, pp. 36690–36706. PMLR (2025)

2025

[9] [9]

In: Interna- tional Conference on Learning Representations (2017)

Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer Sentinel Mixture Models. In: Interna- tional Conference on Learning Representations (2017)

2017

[10] [10]

Journal of Machine Learning Research21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research21(140), 1–67 (2020)

2020

[11] [11]

Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 7432–7439 (2020)

Bisk, Y ., Zellers, R., Le Bras, R., Gao, J., Choi, Y .: PIQA: Reasoning about Physical Com- monsense in Natural Language. Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 7432–7439 (2020)

2020

[12] [12]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think You Have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. arXiv preprint arXiv:1803.05457 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Zellers, R., Holtzman, A., Bisk, Y ., Farhadi, A., Choi, Y .: HellaSwag: Can a Machine Really Finish Your Sentence? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 4791–4800. Association for Computational Linguistics (2019)

2019

[14] [14]

Communications of the ACM64(9), 99–106 (2021)

Sakaguchi, K., Le Bras, R., Bhagavatula, C., Choi, Y .: WinoGrande: An Adversarial Winograd Schema Challenge at Scale. Communications of the ACM64(9), 99–106 (2021)

2021

[15] [15]

In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Clark, C., Lee, K., Chang, M.W., Kwiatkowski, T., Collins, M., Toutanova, K.: BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2924–2936. Association for Computational Linguistics (2019) 12

2019

[16] [16]

In: International Conference on Learning Representations (2021)

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.: Mea- suring Massive Multitask Language Understanding. In: International Conference on Learning Representations (2021)

2021

[17] [17]

arXiv preprint arXiv:2304.01089 (2023)

Yuan, Z., Niu, L., Liu, J., Liu, W., Wang, X., Shang, Y ., Sun, G., Wu, Q., Wu, J., Wu, B.: RPTQ: Reorder-based Post-training Quantization for Large Language Models. arXiv preprint arXiv:2304.01089 (2023)

work page arXiv 2023

[18] [18]

In: International Conference on Learning Representations (2025) 13

Liu, Z., Zhao, C., Fedorov, I., Soran, B., Choudhary, D., Krishnamoorthi, R., Chandra, V ., Tian, Y ., Blankevoort, T.: SpinQuant: LLM Quantization with Learned Rotations. In: International Conference on Learning Representations (2025) 13

2025