Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

Long Peng; Shuai Guo; Xin Di; Yang Cao; Zhanfeng Feng; Zhengjun Zha

arxiv: 2605.26628 · v1 · pith:KGJXFIWInew · submitted 2026-05-26 · 💻 cs.AI

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

Zhanfeng Feng , Shuai Guo , Xin Di , Long Peng , Yang Cao , Zhengjun Zha This is my paper

Pith reviewed 2026-06-29 18:18 UTC · model grok-4.3

classification 💻 cs.AI

keywords post-training quantizationHiFloat4Wan2.2W4A4 quantizationtext-to-video generationoutlier calibrationViDiT-Qactivation tail awareness

0 comments

The pith

Tail-aware percentile calibration reduces the effect of rare outliers in W4A4 HiFloat4 quantization of Wan2.2 while leaving the runtime pipeline unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts the ViDiT-Q post-training quantization pipeline to the Wan2.2 text-to-video model under the HiFloat4 format. Main linear layers receive W4A4 fake quantization while boundary modules stay in high precision. An activation-tail-aware percentile calibration module is introduced for channel-mask construction, paired with compact PTQ-state restoration. The goal is to limit outlier influence from calibration data without modifying the HiFloat4 arithmetic or sampling steps at inference time. This configuration is offered as an entry to a low-bit text-to-video quantization challenge.

Core claim

By adding an activation-tail-aware percentile calibration module for channel-mask construction together with compact PTQ-state restoration, the design reduces the influence of rare calibration outliers on W4A4 HiFloat4 quantization of Wan2.2 transformer modules while keeping the runtime HiFloat4 arithmetic and sampling pipeline unchanged.

What carries the argument

The activation-tail-aware percentile calibration module for channel-mask construction.

If this is right

Main linear layers in Wan2.2 can be quantized to W4A4 HiFloat4 without runtime changes.
Rare calibration outliers exert less influence on the final quantized model.
Boundary modules kept in high precision preserve overall numerical stability.
The full sampling pipeline remains identical to the original HiFloat4 implementation.
The method fits directly into an existing PTQ workflow for text-to-video models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same calibration step could be tested on other diffusion-based video generators that use similar linear-layer structures.
If the percentile thresholds prove stable across different calibration datasets, the approach might reduce the need for dataset-specific outlier handling.
Keeping only boundary modules in high precision suggests a broader pattern where a small number of layers can anchor low-bit inference in larger models.

Load-bearing premise

An activation-tail-aware percentile calibration module can be inserted into the ViDiT-Q pipeline without forcing any change to runtime HiFloat4 arithmetic or sampling, and high-precision boundary modules alone suffice for numerical stability.

What would settle it

Running the quantized Wan2.2 model with the new calibration module and observing either a required modification to the HiFloat4 runtime code or a measurable drop in generated video quality relative to the unquantized baseline.

Figures

Figures reproduced from arXiv: 2605.26628 by Long Peng, Shuai Guo, Xin Di, Yang Cao, Zhanfeng Feng, Zhengjun Zha.

**Figure 1.** Figure 1: Pipeline of the proposed Tail-Aware HiFloat4 W4A4 PTQ system for Wan2.2. Calibration collects activation statistics from the BF16 model, PTQ [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative W4A4 examples. Each row shows the prompt and sampled [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerical format. We quantize the main linear layers in both Wan2.2 transformer modules with W4A4 HiFloat4 fake quantization, keep numerically sensitive boundary modules in high precision, and introduce an activation-tail-aware percentile calibration module for channel-mask construction. Together with compact PTQ-state restoration, this design reduces the influence of rare calibration outliers while keeping the runtime HiFloat4 arithmetic and sampling pipeline unchanged.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a narrow engineering adaptation of ViDiT-Q to Wan2.2 that adds tail-aware calibration offline while leaving the HiFloat4 runtime untouched.

read the letter

The paper takes the public ViDiT-Q post-training quantization pipeline and applies it to the Wan2.2 text-to-video model under the HiFloat4 format. They quantize the main linear layers to W4A4, leave boundary modules in higher precision, and insert an activation-tail-aware percentile step for building channel masks. The stated benefit is that rare outliers in calibration get less weight without any change to the deployed arithmetic or sampling code.

What stands out is the clean separation between the offline calibration stage and the runtime path. PTQ-state restoration and mask construction happen before deployment, so the claim that nothing at inference time needs to change is internally consistent. That design choice fits the practical goal of keeping edge deployment simple.

The tail-aware calibration is the only explicit addition over the base pipeline. On the evidence given, it is presented as a straightforward percentile tweak rather than a new theoretical device. No equations, derivations, or parameter counts appear that would suggest deeper novelty.

The main limitation is the absence of any reported numbers. The description gives no accuracy numbers, no ablation on the tail-aware module, no comparison against plain ViDiT-Q on the same model, and no error bars. Without those, it is impossible to judge whether the added calibration step delivers a measurable improvement or is mostly cosmetic. The soundness therefore rests entirely on the design description.

This work is aimed at engineers already running quantization pipelines on diffusion or transformer video models and who need a drop-in recipe for Wan2.2. A reader looking for a general advance in low-bit methods or new theoretical insight will not find it here.

I would send it to peer review. The challenge context makes the implementation details worth checking, and the runtime-invariance claim is clear enough to evaluate once the experiments are in front of referees.

Referee Report

2 major / 0 minor

Summary. The manuscript describes Tail-Aware HiFloat4, a submission to the low-bit text-to-video generation quantization challenge. It adapts the public ViDiT-Q post-training quantization pipeline to the Wan2.2 model under the HiFloat4 numerical format. The approach quantizes main linear layers with W4A4 HiFloat4 fake quantization, keeps numerically sensitive boundary modules in high precision, and introduces an activation-tail-aware percentile calibration module for channel-mask construction, along with compact PTQ-state restoration. This is claimed to reduce the influence of rare calibration outliers while leaving the runtime HiFloat4 arithmetic and sampling pipeline unchanged.

Significance. If validated with results, the method could offer a practical way to quantize large diffusion transformer models for text-to-video generation to 4-bit weights and activations without modifying the inference pipeline. The emphasis on handling activation tails in calibration and preserving boundary modules in higher precision addresses a common challenge in PTQ for generative models. However, without any reported results, the practical significance remains undetermined.

major comments (2)

[Abstract] Abstract: The abstract provides a high-level method description but contains no experimental results, ablation studies, error bars, or quantitative comparisons to baselines such as the original ViDiT-Q or other quantization methods. Therefore, the central claim that the tail-aware calibration reduces outlier influence cannot be evaluated against data.
[Full Text] Full Text: No tables, figures, or sections presenting quantitative metrics (e.g., FID, CLIP score, or quantization error) are present in the manuscript, making it impossible to verify the effectiveness of the activation-tail-aware percentile calibration or the PTQ-state restoration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We acknowledge that the submitted manuscript is a concise method description prepared for the low-bit text-to-video quantization challenge and currently contains no experimental results, tables, or figures. We will revise the manuscript to incorporate quantitative evaluations as detailed below.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract provides a high-level method description but contains no experimental results, ablation studies, error bars, or quantitative comparisons to baselines such as the original ViDiT-Q or other quantization methods. Therefore, the central claim that the tail-aware calibration reduces outlier influence cannot be evaluated against data.

Authors: We agree that the abstract and the manuscript as submitted do not contain experimental results or comparisons. The manuscript was prepared as a short method description for the challenge, where evaluation occurs through the challenge infrastructure. To address this point, we will expand the abstract and add a dedicated results section with quantitative metrics (FID, CLIP score), ablations on the tail-aware calibration, and direct comparisons to ViDiT-Q in the revised version. revision: yes
Referee: [Full Text] Full Text: No tables, figures, or sections presenting quantitative metrics (e.g., FID, CLIP score, or quantization error) are present in the manuscript, making it impossible to verify the effectiveness of the activation-tail-aware percentile calibration or the PTQ-state restoration.

Authors: The referee correctly observes that the full text contains no tables, figures, or quantitative results. This was an intentional choice for the initial challenge submission format, but we recognize it limits verifiability. In the revision we will add a results section including tables and figures reporting FID, CLIP scores, quantization error, and ablations on the tail-aware percentile calibration and PTQ-state restoration, with comparisons to the baseline ViDiT-Q pipeline. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper is a concise engineering description of adapting the ViDiT-Q PTQ pipeline to Wan2.2 under HiFloat4, with offline calibration modules and high-precision boundary handling. No equations, derivations, fitted parameters presented as predictions, or self-citations appear in the provided text. The central design claim (offline tail-aware percentile calibration plus PTQ-state restoration) is implemented before deployment and does not reduce to any input by construction or self-reference; the runtime path remains unchanged by explicit separation of phases. This is a self-contained implementation report with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No equations, parameters, or new entities are described in the abstract, so the ledger cannot be populated beyond the high-level method outline.

pith-pipeline@v0.9.1-grok · 5648 in / 1085 out tokens · 41378 ms · 2026-06-29T18:18:07.303631+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851

2020
[2]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 10 684–10 695

2022
[3]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4195–4205

2023
[4]

Make-a- video: Text-to-video generation without text-video data,

U. Singer, A. Polyak, T. Hayes, X. Yin, J. An, S. Zhang, Q. Hu, H. Yang, O. Ashual, O. Gafni, D. Parikh, S. Gupta, and Y . Taigman, “Make-a- video: Text-to-video generation without text-video data,” inInternational Conference on Learning Representations, 2023

2023
[5]

Cogvideo: Large-scale pretraining for text-to-video generation via transformers,

W. Hong, M. Ding, W. Zheng, X. Liu, and J. Tang, “Cogvideo: Large-scale pretraining for text-to-video generation via transformers,” inInternational Conference on Learning Representations, 2023

2023
[6]

Imagen Video: High Definition Video Generation with Diffusion Models

J. Ho, W. Chan, C. Saharia, J. Whang, R. Gao, A. Gritsenko, D. P. Kingma, B. Poole, M. Norouzi, D. J. Fleet, and T. Salimans, “Imagen video: High definition video generation with diffusion models,”arXiv preprint arXiv:2210.02303, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

W. Kong, Q. Tian, Z. Zhang, R. Min, Z. Dai, J. Zhou, J. Xiong, X. Liet al., “Hunyuanvideo: A systematic framework for large video generative models,”arXiv preprint arXiv:2412.03603, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan Team, A. Wang, B. Ai, B. Wen, C. Mao, C.-W. Xie, D. Chen, F. Yu et al., “Wan: Open and advanced large-scale video generative models,” arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Vidit- q: Efficient and accurate quantization of diffusion transformers for image and video generation,

T. Zhao, T. Fang, H. Huang, R. Wan, W. Soedarmadji, E. Liu, S. Li, Z. Lin, G. Dai, S. Yan, H. Yang, X. Ning, and Y . Wang, “Vidit- q: Efficient and accurate quantization of diffusion transformers for image and video generation,” inInternational Conference on Learning Representations, 2025

2025
[10]

Wan2.2-I2V-A14B,

Wan-AI, “Wan2.2-I2V-A14B,” Hugging Face model repository, 2025, [Online]. Available: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B. Accessed: May 19, 2026

2025
[11]

Quantization and training of neural networks for efficient integer-arithmetic-only inference,

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713

2018
[12]

Up or down? adaptive rounding for post-training quantization,

M. Nagel, R. A. Amjad, M. van Baalen, C. Louizos, and T. Blankevoort, “Up or down? adaptive rounding for post-training quantization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 7197–7206

2020
[13]

Gptq: Accurate post-training quantization for generative pre-trained transformers,

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” in International Conference on Learning Representations, 2023

2023
[14]

GPT3.int8(): 8-bit matrix multiplication for transformers at scale,

T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer, “GPT3.int8(): 8-bit matrix multiplication for transformers at scale,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 30 318– 30 332

2022
[15]

Zeroquant: Efficient and affordable post-training quantization for large- scale transformers,

Z. Yao, R. Y . Aminabadi, M. Zhang, X. Wu, C. Li, and Y . He, “Zeroquant: Efficient and affordable post-training quantization for large- scale transformers,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 27 168–27 183

2022
[16]

Smoothquant: Accurate and efficient post-training quantization for large language models,

G. Xiao, J. Lin, M. Seznec, H. Wu, J. Demouth, and S. Han, “Smoothquant: Accurate and efficient post-training quantization for large language models,” inProceedings of the 40th International Conference on Machine Learning, 2023, pp. 38 087–38 099

2023
[17]

Awq: Activation-aware weight quantiza- tion for llm compression and acceleration,

J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han, “Awq: Activation-aware weight quantiza- tion for llm compression and acceleration,” inProceedings of Machine Learning and Systems, vol. 6, 2024, pp. 87–100

2024
[18]

Quarot: Outlier-free 4-bit inference in rotated llms,

S. Ashkboos, M. L. Croci, M. G. do Nascimento, T. Hoefler, and J. Hensman, “Quarot: Outlier-free 4-bit inference in rotated llms,” in Advances in Neural Information Processing Systems, vol. 37, 2024

2024
[19]

Post-training quantiza- tion on diffusion models,

Y . Shang, Z. Yuan, B. Xie, B. Wu, and Y . Yan, “Post-training quantiza- tion on diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1972–1981

2023
[20]

Q-diffusion: Quantizing diffusion models,

X. Li, Y . Liu, L. Lian, H. Yang, Z. Dong, D. Kang, S. Zhang, and K. Keutzer, “Q-diffusion: Quantizing diffusion models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 535–17 545

2023
[21]

Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models,

Y . He, J. Liu, W. Wu, H. Zhou, and B. Zhuang, “Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models,” inInterna- tional Conference on Learning Representations, 2024

2024
[22]

Ptq4dit: Post- training quantization for diffusion transformers,

J. Wu, H. Wang, Y . Shang, M. Shah, and Y . Yan, “Ptq4dit: Post- training quantization for diffusion transformers,” inAdvances in Neural Information Processing Systems, vol. 37, 2024

2024
[23]

Q-dit: Accurate post-training quantization for diffusion transformers,

L. Chen, Y . Meng, C. Tang, X. Ma, J. Jiang, X. Wang, Z. Wang, and W. Zhu, “Q-dit: Accurate post-training quantization for diffusion transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 28 306–28 315

2025
[24]

Vq4dit: Efficient post-training vector quantization for diffusion transformers,

J. Deng, S. Li, Z. Wang, H. Gu, K. Xu, and K. Huang, “Vq4dit: Efficient post-training vector quantization for diffusion transformers,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 15, pp. 16 226–16 234, 2025

2025
[25]

Vbench: Comprehensive benchmark suite for video generative models,

Z. Huang, Y . He, J. Yu, F. Zhang, C. Si, Y . Jiang, Y . Zhang, T. Wu, Q. Jin, N. Chanpaisit, Y . Wang, X. Chen, L. Wang, D. Lin, Y . Qiao, and Z. Liu, “Vbench: Comprehensive benchmark suite for video generative models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 807–21 818

2024
[26]

Hifloat4 format for language model inference,

Y . Luo, J. Huang, Y . Cheng, Z. Yu, K. Tang, X. Ma, X. Wang, A. Tong, G. Hu, Y . Xu, M. Taghian, P. Wu, G. Li, Y . Peng, T. Hu, M. Chen, M. B. Mi, H. Liu, X. Zhou, J. Wang, Q. Lin, and H. Liao, “Hifloat4 format for language model inference,”arXiv preprint arXiv:2602.11287, 2026

work page arXiv 2026
[27]

Opens2v-nexus: A detailed benchmark and million-scale dataset for subject-to-video generation,

S. Yuan, X. He, Y . Deng, Y . Ye, J. Huang, B. Lin, J. Luo, and L. Yuan, “Opens2v-nexus: A detailed benchmark and million-scale dataset for subject-to-video generation,”arXiv preprint arXiv:2505.20292, 2025

work page arXiv 2025

[1] [1]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851

2020

[2] [2]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 10 684–10 695

2022

[3] [3]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4195–4205

2023

[4] [4]

Make-a- video: Text-to-video generation without text-video data,

U. Singer, A. Polyak, T. Hayes, X. Yin, J. An, S. Zhang, Q. Hu, H. Yang, O. Ashual, O. Gafni, D. Parikh, S. Gupta, and Y . Taigman, “Make-a- video: Text-to-video generation without text-video data,” inInternational Conference on Learning Representations, 2023

2023

[5] [5]

Cogvideo: Large-scale pretraining for text-to-video generation via transformers,

W. Hong, M. Ding, W. Zheng, X. Liu, and J. Tang, “Cogvideo: Large-scale pretraining for text-to-video generation via transformers,” inInternational Conference on Learning Representations, 2023

2023

[6] [6]

Imagen Video: High Definition Video Generation with Diffusion Models

J. Ho, W. Chan, C. Saharia, J. Whang, R. Gao, A. Gritsenko, D. P. Kingma, B. Poole, M. Norouzi, D. J. Fleet, and T. Salimans, “Imagen video: High definition video generation with diffusion models,”arXiv preprint arXiv:2210.02303, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

W. Kong, Q. Tian, Z. Zhang, R. Min, Z. Dai, J. Zhou, J. Xiong, X. Liet al., “Hunyuanvideo: A systematic framework for large video generative models,”arXiv preprint arXiv:2412.03603, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan Team, A. Wang, B. Ai, B. Wen, C. Mao, C.-W. Xie, D. Chen, F. Yu et al., “Wan: Open and advanced large-scale video generative models,” arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

Vidit- q: Efficient and accurate quantization of diffusion transformers for image and video generation,

T. Zhao, T. Fang, H. Huang, R. Wan, W. Soedarmadji, E. Liu, S. Li, Z. Lin, G. Dai, S. Yan, H. Yang, X. Ning, and Y . Wang, “Vidit- q: Efficient and accurate quantization of diffusion transformers for image and video generation,” inInternational Conference on Learning Representations, 2025

2025

[10] [10]

Wan2.2-I2V-A14B,

Wan-AI, “Wan2.2-I2V-A14B,” Hugging Face model repository, 2025, [Online]. Available: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B. Accessed: May 19, 2026

2025

[11] [11]

Quantization and training of neural networks for efficient integer-arithmetic-only inference,

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713

2018

[12] [12]

Up or down? adaptive rounding for post-training quantization,

M. Nagel, R. A. Amjad, M. van Baalen, C. Louizos, and T. Blankevoort, “Up or down? adaptive rounding for post-training quantization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 7197–7206

2020

[13] [13]

Gptq: Accurate post-training quantization for generative pre-trained transformers,

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” in International Conference on Learning Representations, 2023

2023

[14] [14]

GPT3.int8(): 8-bit matrix multiplication for transformers at scale,

T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer, “GPT3.int8(): 8-bit matrix multiplication for transformers at scale,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 30 318– 30 332

2022

[15] [15]

Zeroquant: Efficient and affordable post-training quantization for large- scale transformers,

Z. Yao, R. Y . Aminabadi, M. Zhang, X. Wu, C. Li, and Y . He, “Zeroquant: Efficient and affordable post-training quantization for large- scale transformers,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 27 168–27 183

2022

[16] [16]

Smoothquant: Accurate and efficient post-training quantization for large language models,

G. Xiao, J. Lin, M. Seznec, H. Wu, J. Demouth, and S. Han, “Smoothquant: Accurate and efficient post-training quantization for large language models,” inProceedings of the 40th International Conference on Machine Learning, 2023, pp. 38 087–38 099

2023

[17] [17]

Awq: Activation-aware weight quantiza- tion for llm compression and acceleration,

J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han, “Awq: Activation-aware weight quantiza- tion for llm compression and acceleration,” inProceedings of Machine Learning and Systems, vol. 6, 2024, pp. 87–100

2024

[18] [18]

Quarot: Outlier-free 4-bit inference in rotated llms,

S. Ashkboos, M. L. Croci, M. G. do Nascimento, T. Hoefler, and J. Hensman, “Quarot: Outlier-free 4-bit inference in rotated llms,” in Advances in Neural Information Processing Systems, vol. 37, 2024

2024

[19] [19]

Post-training quantiza- tion on diffusion models,

Y . Shang, Z. Yuan, B. Xie, B. Wu, and Y . Yan, “Post-training quantiza- tion on diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1972–1981

2023

[20] [20]

Q-diffusion: Quantizing diffusion models,

X. Li, Y . Liu, L. Lian, H. Yang, Z. Dong, D. Kang, S. Zhang, and K. Keutzer, “Q-diffusion: Quantizing diffusion models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 535–17 545

2023

[21] [21]

Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models,

Y . He, J. Liu, W. Wu, H. Zhou, and B. Zhuang, “Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models,” inInterna- tional Conference on Learning Representations, 2024

2024

[22] [22]

Ptq4dit: Post- training quantization for diffusion transformers,

J. Wu, H. Wang, Y . Shang, M. Shah, and Y . Yan, “Ptq4dit: Post- training quantization for diffusion transformers,” inAdvances in Neural Information Processing Systems, vol. 37, 2024

2024

[23] [23]

Q-dit: Accurate post-training quantization for diffusion transformers,

L. Chen, Y . Meng, C. Tang, X. Ma, J. Jiang, X. Wang, Z. Wang, and W. Zhu, “Q-dit: Accurate post-training quantization for diffusion transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 28 306–28 315

2025

[24] [24]

Vq4dit: Efficient post-training vector quantization for diffusion transformers,

J. Deng, S. Li, Z. Wang, H. Gu, K. Xu, and K. Huang, “Vq4dit: Efficient post-training vector quantization for diffusion transformers,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 15, pp. 16 226–16 234, 2025

2025

[25] [25]

Vbench: Comprehensive benchmark suite for video generative models,

Z. Huang, Y . He, J. Yu, F. Zhang, C. Si, Y . Jiang, Y . Zhang, T. Wu, Q. Jin, N. Chanpaisit, Y . Wang, X. Chen, L. Wang, D. Lin, Y . Qiao, and Z. Liu, “Vbench: Comprehensive benchmark suite for video generative models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 807–21 818

2024

[26] [26]

Hifloat4 format for language model inference,

Y . Luo, J. Huang, Y . Cheng, Z. Yu, K. Tang, X. Ma, X. Wang, A. Tong, G. Hu, Y . Xu, M. Taghian, P. Wu, G. Li, Y . Peng, T. Hu, M. Chen, M. B. Mi, H. Liu, X. Zhou, J. Wang, Q. Lin, and H. Liao, “Hifloat4 format for language model inference,”arXiv preprint arXiv:2602.11287, 2026

work page arXiv 2026

[27] [27]

Opens2v-nexus: A detailed benchmark and million-scale dataset for subject-to-video generation,

S. Yuan, X. He, Y . Deng, Y . Ye, J. Huang, B. Lin, J. Luo, and L. Yuan, “Opens2v-nexus: A detailed benchmark and million-scale dataset for subject-to-video generation,”arXiv preprint arXiv:2505.20292, 2025

work page arXiv 2025