pith. sign in

arxiv: 2606.13054 · v1 · pith:YSYCUIVAnew · submitted 2026-06-11 · 💻 cs.LG · cs.AI

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

Pith reviewed 2026-06-27 07:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords post-training quantizationternary weightslow-bit activationslarge language modelsmodel compressioninference acceleration
0
0 comments X

The pith

TWLA is a post-training method that achieves 1.58-bit ternary weights and 4-bit activations for large language models while preserving accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models face deployment barriers from high memory and compute demands. The paper presents TWLA as a post-training quantization approach that reaches 1.58-bit weights and 4-bit activations. It does so through three linked components that handle ternarization error, reshape distributions via rotation, and allocate activation bits across layers with interaction costs. A sympathetic reader would care because this combination promises both smaller model footprints and end-to-end inference speedups that prior ternary methods could not deliver without keeping activations at higher precision.

Core claim

The paper claims that its TWLA framework, built from Euclidean-to-Manifold Asymmetric Ternary Quantizer for layer-output error minimization, Kronecker Orthogonal Tri-Modal Shaping for tri-modal weight reshaping and outlier suppression, and Inter-Layer Aware Activation Mixed Precision for joint bit allocation, enables W1.58A4 quantization of LLMs that maintains high accuracy on language tasks and yields inference acceleration.

What carries the argument

The Kronecker Orthogonal Tri-Modal Shaping component, which applies a Kronecker-structured orthogonal rotation to weights to produce ternary-friendly tri-modal distributions while statistically suppressing activation outliers.

If this is right

  • Model memory footprint shrinks substantially from 1.58-bit weight storage.
  • Inference compute accelerates because both weights and activations use low-bit arithmetic.
  • End-to-end low-precision execution becomes feasible without retaining high-precision activations.
  • Accuracy remains competitive with full-precision baselines on language modeling and downstream tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The outlier-suppression effect of the shared rotation might extend to other low-bit activation schemes beyond 4 bits.
  • The inter-layer cost modeling could be adapted to mixed-precision schemes that also vary weight bits.
  • Hardware implementations could exploit the fixed rotation matrix for further speed gains on specific accelerators.

Load-bearing premise

The assumption that the shared orthogonal rotation sufficiently suppresses activation outliers to support reliable 4-bit quantization without major accuracy loss.

What would settle it

A clear accuracy or perplexity drop on a standard LLM benchmark when the full TWLA pipeline is applied with 4-bit activations would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.13054 by Dawei Yang, Xing Hu, Zhe Jiang, Zhixiong Zhao, Zhixuan Chen, Zukang Xu.

Figure 1
Figure 1. Figure 1: LLaMA-family performance on seven zero-shot tasks. TWLA remains robust under both weight-only and weight–activation quantization (at equal memory cost), while other methods degrade substantially with 4-bit activation quantization. 1. Introduction In recent years, Large Language Models (LLMs) have achieved outstanding results across a wide range of diverse domains (Zhu et al., 2025). However, this strong ca… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Before applying TWLA, activation outliers hinder low-bit quantization and unimodal weights are misaligned with ternary codebooks. (b) After applying TWLA, activations are smoothed and weights are transformed into symmetric tri-modal forms. additions and simple branching to lower inference costs, while offering stronger representational power and a su￾perior accuracy-efficiency trade-off. Recent studies… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of TWLA (initialization follows the same design as PT2 -LLM (Yan et al., 2025)). Euclidean-to-Manifold Asymmetric Ternary Quantizer (E2M-ATQ): minimizes layer-output error under weight ternarization via a two-stage optimization. Kronecker Orthogonal Tri-Modal Shaping (KOTMS): reshapes weight into ternary-friendly tri-modal distribution. Inter-Layer Aware Activation Mixed Precision (ILA-AMP): preve… view at source ↗
Figure 4
Figure 4. Figure 4: (a) Ablation of the Overhead–Performance trade-off for KOTMS. (b) Comparison of throughput and memory consumption across FP16, QuaRot, and TWLA. (c) Comparison of quantization time and perplexity across QuaRot, SliM-LLM, and TWLA [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Perplexity of LLaMA3-8B (W1.58A4) using calibration data sampled with different number or seeds from WikiText2. Summary. Overall, the expanded tables consistently show that TWLA offers superior robustness and scalability as activation precision decreases from 16 to 6 and further to 4 bits. While existing PTQ methods rapidly lose accuracy (and often collapse) under low-bit activations, TWLA maintains strong… view at source ↗
Figure 6
Figure 6. Figure 6: Calibration cost versus quantization quality under different interaction orders. Bars report the wall-clock time (minutes) spent collecting validation NLL statistics for 1st-, 2nd-, and 3rd-order interaction modeling, and dots report WikiText2 perplexity (lower is better). variant (ILA-AMP) additionally models adjacent-layer pairwise couplings to capture error propagation across neighboring layers. The 3rd… view at source ↗
Figure 7
Figure 7. Figure 7: Layer-wise activation bit allocation produced by ILA-AMP under an average activation bit budget of 4 on the WikiText2 calibration set for LLaMA2-7B and LLaMA3-8B. Numbers above bars denote the assigned per-layer activation bit-width. and output embeddings: larger input–output changes indicate higher importance, and the metric requires calibration data. • ZD (Z-score Distribution) quantifies importance by t… view at source ↗
Figure 8
Figure 8. Figure 8: The weight distribution of the 12th layer in Qwen3-8B before and after TWLA. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The activation distribution of the 12th layer in Qwen3-8B before and after TWLA. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Cross-layer heterogeneity of activation outlier suppression after KOTMS on QWEN3-8B. Quantile plots (Min/Max, 1/99%, and 25/75%) of the Q-projection input activations are shown for Layers 4, 12, 24, and 36, comparing the original activations (left) with those after KOTMS projection (right). 34 [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗
read the original abstract

Large language models (LLMs) exhibit exceptional general language processing capabilities, but their memory and compute costs hinder deployment. Ternarization has emerged as a promising compression technique, offering significant reductions in model size and inference complexity. However, existing methods struggle with heavy-tailed activation distributions and therefore keep activations in high precision, fundamentally limiting end-to-end inference acceleration. To overcome this limitation, we propose TWLA, a post-training quantization (PTQ) framework that achieves 1.58-bit weight compression and 4-bit activation quantization while maintaining high accuracy. TWLA comprises three components: (1) Euclidean-to-Manifold Asymmetric Ternary Quantizer (E2M-ATQ) minimizes layer-output error under weight ternarization via a two-stage optimization from Euclidean initialization to manifold relocation; (2) Kronecker Orthogonal Tri-Modal Shaping (KOTMS) applies a Kronecker-structured orthogonal rotation to reshape weights into ternary-friendly tri-modal distributions, while the shared rotation statistically suppresses activation outliers; and (3) Inter-Layer Aware Activation Mixed Precision (ILA-AMP) explicitly introduces adjacent-layer second-order interaction costs in bit allocation and jointly optimizes for the layer-wise disparity of activation quantization gains induced by the shared orthogonal transform, preventing cascades triggered by a few weak layers. Extensive experiments demonstrate that TWLA maintains high accuracy under W1.58A4, while delivering significant inference acceleration. The code is available at <https://github.com/Kishon-zzx/TWLA>.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces TWLA, a post-training quantization framework for LLMs that achieves 1.58-bit ternary weights and 4-bit activations while preserving high accuracy. It consists of three components: E2M-ATQ (a two-stage Euclidean-to-manifold optimization for layer-output error minimization under ternarization), KOTMS (Kronecker-structured orthogonal rotation to produce tri-modal weight distributions that also statistically suppresses activation outliers), and ILA-AMP (inter-layer aware mixed-precision activation bit allocation that accounts for second-order interactions and shared-transform effects). Experiments are claimed to demonstrate maintained accuracy under W1.58A4 with inference acceleration; code is released.

Significance. If the accuracy claims under W1.58A4 hold with rigorous verification, the work would advance practical end-to-end low-bit LLM inference by addressing the activation outlier barrier that has limited prior ternarization methods. Code availability strengthens reproducibility.

major comments (2)
  1. [KOTMS] KOTMS section: the claim that the shared Kronecker orthogonal rotation 'statistically suppresses activation outliers' sufficiently for reliable 4-bit quantization lacks any bound on the dynamic-range reduction factor, any analysis of interaction between the tri-modal weight constraint and activation moments, and any ablation isolating this effect from ILA-AMP. This is load-bearing for the central W1.58A4 accuracy-maintenance claim.
  2. [Experiments] Experiments section: without reported ablations or quantitative measurements (e.g., 99.9th-percentile reduction factor) showing that outlier suppression reaches the ~4-8x level typically required to avoid clipping loss in 4-bit quantization, the accuracy results cannot be attributed to KOTMS rather than the other components or post-hoc tuning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the KOTMS component and its experimental validation. We address the two major comments below and commit to revisions that strengthen the manuscript with the requested analyses and measurements.

read point-by-point responses
  1. Referee: [KOTMS] KOTMS section: the claim that the shared Kronecker orthogonal rotation 'statistically suppresses activation outliers' sufficiently for reliable 4-bit quantization lacks any bound on the dynamic-range reduction factor, any analysis of interaction between the tri-modal weight constraint and activation moments, and any ablation isolating this effect from ILA-AMP. This is load-bearing for the central W1.58A4 accuracy-maintenance claim.

    Authors: We agree that additional analysis would strengthen the presentation. In the revision we will derive a bound on the dynamic-range reduction factor induced by the Kronecker-structured orthogonal rotation, provide a short analysis of the interaction between the resulting tri-modal weight distribution and the first two moments of the activations, and add an ablation that isolates KOTMS from ILA-AMP by comparing W1.58A4 accuracy with and without the shared rotation while keeping all other components fixed. revision: yes

  2. Referee: [Experiments] Experiments section: without reported ablations or quantitative measurements (e.g., 99.9th-percentile reduction factor) showing that outlier suppression reaches the ~4-8x level typically required to avoid clipping loss in 4-bit quantization, the accuracy results cannot be attributed to KOTMS rather than the other components or post-hoc tuning.

    Authors: We will augment the experiments section with the requested quantitative measurements, specifically reporting the 99.9th-percentile activation-range reduction factor achieved by KOTMS across the evaluated models. We will also include the ablation results described above, which directly quantify the outlier-suppression contribution of KOTMS and confirm that the observed reduction is sufficient to support reliable 4-bit activation quantization. revision: yes

Circularity Check

0 steps flagged

No circularity: framework claims rest on novel components with external code

full rationale

The paper presents TWLA as a new three-component PTQ framework (E2M-ATQ, KOTMS, ILA-AMP) whose central claims about W1.58A4 accuracy are tied to the proposed algorithms and their statistical effects rather than any self-definition, fitted-input prediction, or load-bearing self-citation chain. The abstract explicitly positions the outlier suppression as a consequence of the shared Kronecker rotation (a design choice, not a tautology) and makes the implementation available at an external GitHub link, allowing independent verification. No equations or prior-author citations are invoked in a way that reduces the derivation to its inputs by construction. This is the normal case of a self-contained empirical method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the contributions are algorithmic components rather than new physical or mathematical entities.

pith-pipeline@v0.9.1-grok · 5821 in / 1131 out tokens · 33728 ms · 2026-06-27T07:36:33.148346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 3 canonical work pages

  1. [1]

    Ashkboos, S., Mohtashami, A., Croci, M

    doi: 10.1109/IJCNN.2017.7966166. Ashkboos, S., Mohtashami, A., Croci, M. L., Li, B., Cameron, P., Jaggi, M., Alistarh, D., Hoefler, T., and Hensman, J. Quarot: Outlier-free 4-bit inference in ro- tated llms.Advances in Neural Information Processing Systems, 37:100213–100240,

  2. [2]

    Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H

    URL https://arxiv.org/abs/ 1911.11641. Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y ., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavar- ian, M., Winter, C., Tillet, P., S...

  3. [3]

    Chen, T., Li, Z., Xu, W., Zhu, Z., Li, D., Tian, L., Barsoum, E., Wang, P., and Cheng, J

    URL https://arxiv.org/abs/ 2107.03374. Chen, T., Li, Z., Xu, W., Zhu, Z., Li, D., Tian, L., Barsoum, E., Wang, P., and Cheng, J. Ternaryllm: Ternarized large language model,

  4. [4]

    Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O

    URL https://arxiv.org/ abs/2406.07177. Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge,

  5. [5]

    Cobbe, K., Kosaraju, V ., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J

    URL https://arxiv.org/abs/ 1803.05457. Cobbe, K., Kosaraju, V ., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J. Training verifiers to solve math word problems,

  6. [6]

    org/abs/2110.14168

    URL https://arxiv. org/abs/2110.14168. Dettmers, T., Svirschevski, R., Egiazarian, V ., Kuznedelev, D., Frantar, E., Ashkboos, S., Borzunov, A., Hoefler, T., and Alistarh, D. Spqr: A sparse-quantized representation for near-lossless llm weight compression,

  7. [7]

    Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D

    URL https://arxiv.org/abs/2306.03078. Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. Gptq: Accurate post-training quantization for generative pre- trained transformers,

  8. [8]

    org/abs/2210.17323

    URL https://arxiv. org/abs/2210.17323. Gu, H., Hu, L., Yu, L., Li, H., and Liu, F. Lopro: Enhanc- ing low-rank quantization via permuted block-wise rota- tion,

  9. [9]

    doi: 10.1145/3649329. 3658498. URL http://dx.doi.org/10.1145/ 3649329.3658498. Gul, H., Hu, L., Niu, S., and Liu, F. Flrq: Faster llm quan- tization with flexible low-rank matrix sketching,

  10. [10]

    Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., Zhang, X., Yu, X., Wu, Y ., Wu, Z

    URLhttps://arxiv.org/abs/2601.05684. Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., Zhang, X., Yu, X., Wu, Y ., Wu, Z. F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., Xue, B., Wang, B., Wu, B., Feng, B., Lu, C., Zhao, C., Deng, C., Ruan, C., Dai, D., Chen, D., Ji, D., Li, E., Lin, F., Dai, F., Luo, F., Hao, ...

  11. [11]

    Nature645,633–638

    ISSN 1476-4687. doi: 10.1038/s41586-025-09422-z. URL http://dx. doi.org/10.1038/s41586-025-09422-z. Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. Measuring massive multitask language understanding,

  12. [12]

    Huang, W., Qin, H., Liu, Y ., Li, Y ., Liu, Q., Liu, X., Benini, L., Magno, M., Zhang, S., and Qi, X

    URL https: //arxiv.org/abs/2009.03300. Huang, W., Qin, H., Liu, Y ., Li, Y ., Liu, Q., Liu, X., Benini, L., Magno, M., Zhang, S., and Qi, X. Slim-llm: Salience- driven mixed-precision quantization for large language models.arXiv preprint arXiv:2405.14917,

  13. [13]

    Li, M., Zhang, D., He, T., Xie, X., Li, Y .-F., and Qin, K

    URL https://arxiv.org/ abs/1605.04711. Li, M., Zhang, D., He, T., Xie, X., Li, Y .-F., and Qin, K. Towards effective data-free knowledge distillation via diverse diffusion augmentation. InProceedings of the 32nd ACM International Conference on Multimedia, pp. 4416–4425, 2024a. Li, M., Zhang, D., Dong, Q., Xie, X., and Qin, K. Adaptive dataset quantization...

  14. [14]

    Arb-llm: Alternating refined binarizations for large language models.arXiv preprint arXiv:2410.03129, 2024b

    Li, Z., Yan, X., Zhang, T., Qin, H., Xie, D., Tian, J., Kong, L., Zhang, Y ., Yang, X., et al. Arb-llm: Alternating refined binarizations for large language models.arXiv preprint arXiv:2410.03129, 2024b. Liu, K., Yang, K., Chen, Z., Li, Z., Guo, Y ., Li, W., Kong, L., and Zhang, Y . Bimacosr: Binary one-step diffusion model leveraging flexible matrix comp...

  15. [15]

    Merity, S., Xiong, C., Bradbury, J., and Socher, R

    URL https://arxiv.org/ abs/2502.00333. Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models.arXiv preprint arXiv:1609.07843,

  16. [16]

    Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., and Liu, P

    URL https: //arxiv.org/abs/1606.06031. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., and Liu, P. J. Explor- ing the limits of transfer learning with a unified text-to- text transformer,

  17. [17]

    Sakaguchi, K., Bras, R

    URL https://arxiv.org/ abs/1910.10683. Sakaguchi, K., Bras, R. L., Bhagavatula, C., and Choi, Y . Winogrande: An adversarial winograd schema challenge at scale,

  18. [18]

    Saxena, U., Sharify, S., Roy, K., and Wang, X

    URL https://arxiv.org/abs/ 1907.10641. Saxena, U., Sharify, S., Roy, K., and Wang, X. Resq: Mixed- precision quantization of large language models with low- rank residuals.arXiv preprint arXiv:2412.14363,

  19. [19]

    URL https: //arxiv.org/abs/2310.00034. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi `ere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. Llama: Open and efficient foundation lan- guage models,

  20. [20]

    Wang, H., Ma, S., and Wei, F

    URL https://arxiv.org/ abs/2302.13971. Wang, H., Ma, S., and Wei, F. Bitnet v2: Native 4-bit activa- tions with hadamard transformation for 1-bit llms,

  21. [21]

    Wang, P., Hu, Q., Zhang, Y ., Zhang, C., Liu, Y ., and Cheng, J

    URLhttps://arxiv.org/abs/2504.18415. Wang, P., Hu, Q., Zhang, Y ., Zhang, C., Liu, Y ., and Cheng, J. Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June

  22. [22]

    11 TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization Xu, Z., Zhao, Z., Hu, X., Chen, Z., and Yang, D

    URLhttps://arxiv.org/abs/2511.22316. 11 TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization Xu, Z., Zhao, Z., Hu, X., Chen, Z., and Yang, D. Kbvq- moe: Klt-guided svd with bias-corrected vector quanti- zation for moe large language models.arXiv preprint arXiv:2602.11184,

  23. [23]

    URL https://arxiv.org/abs/2510.03267. Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M...

  24. [24]

    Zhang, W., Hou, L., Yin, Y ., Shang, L., Chen, X., Jiang, X., and Liu, Q

    URL https://arxiv.org/abs/ 1905.07830. Zhang, W., Hou, L., Yin, Y ., Shang, L., Chen, X., Jiang, X., and Liu, Q. Ternarybert: Distillation-aware ultra-low bit bert,

  25. [25]

    URL https://arxiv.org/abs/2009. 12812. Zhao, Z., Li, H., Liu, F., Lu, Y ., Wang, Z., Yang, T., Jiang, L., and Guan, H. Quark: Quantization-enabled circuit sharing for transformer acceleration by exploiting com- mon patterns in nonlinear operations. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9. IEEE,

  26. [26]

    Specquant: Spectral decomposition and adaptive truncation for ultra-low-bit llms quantization

    Zhao, Z., Liu, F., Wang, J., Guan, C., Wang, Z., Jiang, L., and Guan, H. Specquant: Spectral decomposition and adaptive truncation for ultra-low-bit llms quantization. In Proceedings of the AAAI Conference on Artificial Intelli- gence, volume 40, pp. 28786–28794, 2026a. Zhao, Z., Xu, Z., and Yang, D. Bwla: Breaking the barrier of w1ax post-training quanti...

  27. [27]

    org/abs/1612.01064

    URL https://arxiv. org/abs/1612.01064. Zhu, C., Lin, Y ., Shao, J., Lin, J., and Wang, Y . Pathology- aware prototype evolution via llm-driven semantic disam- biguation for multicenter diabetic retinopathy diagnosis. InProceedings of the 33rd ACM International Confer- ence on Multimedia, pp. 9196–9205,

  28. [28]

    The storage reduction factor is therefore m2 n2 1 +n 2 2 = 16,777,216 8192 = 2048.(57) Thus, the Kronecker parameterization is2048×smaller in storage than a dense4096×4096transform

    yields only n2 1 +n 2 2 = 642 + 642 = 8192(56) entries. The storage reduction factor is therefore m2 n2 1 +n 2 2 = 16,777,216 8192 = 2048.(57) Thus, the Kronecker parameterization is2048×smaller in storage than a dense4096×4096transform. B.3.3. FAST APPLICATION VIA RESHAPE–MULTIPLY–VECTORIZE Let v∈R 1×m. Reshape it into a matrix V∈R n1×n2 such that vec(V)...