pith. sign in

arxiv: 2503.16080 · v2 · submitted 2025-03-20 · 💻 cs.CR

Fast Homomorphic Linear Algebra with BLAS

Pith reviewed 2026-05-22 23:21 UTC · model grok-4.3

classification 💻 cs.CR
keywords homomorphic encryptionCKKSlinear algebraBLASmatrix multiplicationprivacy-preserving computationencrypted AI
0
0 comments X

The pith

CKKS homomorphic matrix operations reduce directly to plaintext BLAS routines with a 4-12x slowdown for square matrix multiplication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to map matrix-vector products, vector-vector products, and matrix multiplications in the CKKS homomorphic encryption scheme onto equivalent plaintext floating-point operations. These reductions let the computation reuse highly optimized BLAS libraries that already exploit hardware parallelism and cache behavior. A reader would care because many privacy-preserving applications in AI and scientific computing rely on exactly these linear algebra kernels, and the claimed overhead is small enough to make encrypted versions practical for moderate to large matrices.

Core claim

We provide reductions for matrix-vector products, vector-vector products for moderate-sized to large matrices to their plaintext equivalents. Combined with BLAS, we demonstrate that the efficiency loss between CKKS-based encrypted square matrix multiplication and double-precision floating-point square matrix multiplication is a mere 4-12 factor, depending on the precise situation.

What carries the argument

The explicit reductions that translate each homomorphic matrix operation into one or more calls to standard plaintext linear-algebra routines while preserving correctness.

If this is right

  • Encrypted matrix multiplication becomes feasible at scales where plaintext BLAS already runs efficiently.
  • Existing high-performance linear-algebra code bases can be reused for homomorphic workloads with only a small constant-factor penalty.
  • Applications that need many matrix-vector or dot-product steps can switch to the encrypted setting without rewriting their core loops.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the reductions compose cleanly, the same technique might apply to higher-level algorithms such as iterative solvers or neural-network layers that are built from these primitives.
  • Hardware vendors optimizing BLAS could indirectly accelerate encrypted computation by the same factor without any changes to the encryption library.
  • The approach separates the cryptographic noise management from the arithmetic scheduling, which may simplify porting to new hardware accelerators.

Load-bearing premise

The reductions from homomorphic CKKS operations to plaintext linear algebra preserve both correctness and the claimed efficiency without hidden overheads from encoding, noise management, or memory access patterns that are not captured by the high-level mapping.

What would settle it

A side-by-side timing measurement, on the same hardware, of a direct CKKS matrix-multiplication implementation versus the same computation after applying the paper's reductions and calling BLAS, for matrix sizes between 256 and 4096.

Figures

Figures reproduced from arXiv: 2503.16080 by Damien Stehl\'e, Guillaume Hanrot, Jai Hyun Park, Jung Hee Cheon, Youngjin Bae.

Figure 1
Figure 1. Figure 1: Visualization of a C-MT application to re-formatting client-wise ciphertexts into feature-wise ciphertexts. for 0 ≤ j < N, where m(X) = P 0≤i<N miXi is an element in Rq, and Gal(R/Z) is the group of automor￾phisms of Q[X]/(XN + 1) that fix Q. By abuse, we let the latter act on Rq; if m = P 0≤i<N miXi ∈ Rq, we define σ(m) = P 0≤i<N miσ(X) i ∈ Rq. Given N plaintexts {mi = P j Mi,jXj}0≤i<N in Rq that each enc… view at source ↗
read the original abstract

Homomorphic encryption is a cryptographic paradigm allowing to compute on encrypted data, opening a wide range of applications in privacy-preserving data manipulation, notably in AI. Many of those applications require significant linear algebra computations (matrix-vector products, and matrix-matrix products). This central role of linear algebra computations goes far beyond homomorphic algebra and applies to most areas of scientific computing. This high versatility led, over time, to the development of a set of highly optimized routines, specified in 1979 under the name BLAS (basic linear algebra subroutines). Motivated both by the applicative importance of homomorphic linear algebra and the access to highly efficient implementations of cleartext linear algebra able to draw the most out of available hardware, we explore the connections between CKKS-based homomorphic linear algebra and floating-point plaintext linear algebra. The CKKS homomorphic encryption system is the most natural choice in this setting, as it natively handles real numbers and offers a large SIMD parallelism. We provide reductions for matrix-vector products, vector-vector products for moderate-sized to large matrices to their plaintext equivalents. Combined with BLAS, we demonstrate that the efficiency loss between CKKS-based encrypted square matrix multiplication and double-precision floating-point square matrix multiplication is a mere 4-12 factor, depending on the precise situation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to establish reductions mapping CKKS homomorphic matrix-vector and vector-vector products (for moderate-to-large matrices) onto equivalent plaintext linear-algebra operations. When these reductions are combined with existing BLAS libraries, the authors report that the slowdown of CKKS-based encrypted square matrix multiplication relative to double-precision plaintext multiplication is only a factor of 4-12, depending on the precise setting.

Significance. If the reductions are shown to preserve both correctness and the claimed performance without unaccounted overheads from rotations, key-switching or encoding, the result would be significant: it would demonstrate that homomorphic linear algebra can directly exploit decades of hardware-specific optimization in plaintext BLAS, substantially lowering the barrier to practical privacy-preserving linear-algebra workloads in machine learning and scientific computing.

major comments (2)
  1. [Abstract] Abstract: the central performance claim (4-12× slowdown) rests on the assertion that the listed homomorphic operations reduce to plaintext BLAS calls. The manuscript provides no explicit accounting of the cost of ciphertext rotations, key-switching, or noise-management steps that are required by CKKS but are not obviously expressible as BLAS primitives; if these steps dominate the runtime, the reported factor understates the true overhead.
  2. [Experimental evaluation] The experimental section does not report matrix dimensions, number of repetitions, or whether timings include encoding/decoding and rotation-key application. Without these details it is impossible to verify that the measured 4-12× ratio isolates the linear-algebra kernel as claimed.
minor comments (2)
  1. Notation for ciphertext packing and SIMD slot alignment is introduced without a dedicated preliminary section, making the reduction arguments harder to follow for readers outside the immediate subfield.
  2. [Abstract] The abstract refers to “moderate-sized to large matrices” without quantitative bounds; a short table or sentence giving the range of n for which the reductions apply would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address the major comments below and will revise the manuscript accordingly to improve clarity on performance accounting and experimental details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claim (4-12× slowdown) rests on the assertion that the listed homomorphic operations reduce to plaintext BLAS calls. The manuscript provides no explicit accounting of the cost of ciphertext rotations, key-switching, or noise-management steps that are required by CKKS but are not obviously expressible as BLAS primitives; if these steps dominate the runtime, the reported factor understates the true overhead.

    Authors: Our reductions are constructed to express the dominant costs of CKKS matrix-vector and matrix-matrix multiplication as sequences of plaintext BLAS operations. The auxiliary costs of rotations, key-switching, and noise management are either integrated into the reduction (via batched plaintext equivalents) or captured in the end-to-end timings that produce the reported 4-12× factor. To make this accounting explicit, we will add a dedicated subsection with a cost breakdown of each CKKS primitive relative to the BLAS calls. revision: yes

  2. Referee: [Experimental evaluation] The experimental section does not report matrix dimensions, number of repetitions, or whether timings include encoding/decoding and rotation-key application. Without these details it is impossible to verify that the measured 4-12× ratio isolates the linear-algebra kernel as claimed.

    Authors: We agree that these experimental parameters must be stated explicitly for reproducibility. The revised manuscript will report the exact matrix dimensions evaluated, the number of repetitions used to obtain stable timings, and confirm that all measurements include encoding, decoding, and rotation-key application steps. This will demonstrate that the 4-12× ratio reflects the complete homomorphic linear-algebra workload. revision: yes

Circularity Check

0 steps flagged

No circularity; reductions map to independent external BLAS implementations

full rationale

The paper's derivation consists of explicit algorithmic reductions that map CKKS matrix-vector and vector-vector products onto plaintext equivalents, then invokes independently developed and benchmarked BLAS libraries for the plaintext side. The reported 4-12x slowdown factors are obtained by direct timing comparisons against double-precision floating-point matrix multiplication rather than by any quantity defined in terms of the homomorphic result itself. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, or ansatzes smuggled via prior work appear in the abstract or described chain; the central efficiency claim therefore remains self-contained against external plaintext benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard algebraic properties of the CKKS scheme and the existence of optimized BLAS implementations; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)
  • domain assumption CKKS natively supports real-number arithmetic and SIMD-style packing of multiple values into ciphertexts
    Invoked to justify why CKKS is the natural choice for linear algebra over encrypted data.
  • standard math BLAS libraries deliver near-optimal performance for plaintext floating-point matrix operations on available hardware
    Used as the performance baseline for the claimed overhead factor.

pith-pipeline@v0.9.0 · 5767 in / 1380 out tokens · 91113 ms · 2026-05-22T23:21:37.876738+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    OpenBLAS: An optimized BLAS library – version 0.3.26, available at https://www.openblas.net/

  2. [2]

    754-2019 - ieee standard for floating-point arithmetic (2019)

  3. [3]

    Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. (2015), software available at https://github.com/malb/lattice-estimator

  4. [4]

    In: USENIX (2021)

    Ali, A., Lepoint, T., Patel, S., Raykova, M., Schoppmann, P., Seth, K., Yeo, K.: Communication-computation trade-offs in PIR. In: USENIX (2021)

  5. [5]

    In: SODA (2025)

    Alman, J., Duan, R., Williams, V.V., Xu, Y., Xu, Z., Zhou, R.: More asymmetry yields faster matrix multiplica- tion. In: SODA (2025)

  6. [6]

    In: IEEE S&P (2018)

    Angel, S., Chen, H., Laine, K., Setty, S.T.V.: PIR with compressed queries and amortized query processing. In: IEEE S&P (2018)

  7. [7]

    In: CRYPTO (2024)

    Bae, Y., Cheon, J.H., Hanrot, G., Park, J.H., Stehl´ e, D.: Plaintext-ciphertext matrix multiplication and FHE bootstrapping: Fast and fused. In: CRYPTO (2024)

  8. [8]

    In: CRYPTO (2023)

    Bae, Y., Cheon, J.H., Kim, J., Park, J.H., Stehl´ e, D.: HERMES: efficient ring packing using MLWE ciphertexts and application to transciphering. In: CRYPTO (2023)

  9. [9]

    In: SAC (2016)

    Bajard, J.C., Eynard, J., Hasan, M.A., Zucca, V.: A full RNS variant of FV like somewhat homomorphic encryp- tion schemes. In: SAC (2016)

  10. [10]

    In: TCC (2005)

    Boneh, D., Goh, E.J., Nissim, K.: Evaluating 2-DNF formulas on ciphertexts. In: TCC (2005)

  11. [11]

    Boura, C., Gama, N., Georgieva, M., Jetchev, D.: CHIMERA: combining Ring-LWE-based fully homomorphic encryption schemes. J. Math. Cryptol. (2020)

  12. [12]

    In: CRYPTO (2012)

    Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical gapSVP. In: CRYPTO (2012)

  13. [13]

    ACM Trans

    Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory (2014)

  14. [14]

    In: STOC (2013)

    Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehl´ e, D.: Classical hardness of learning with errors. In: STOC (2013)

  15. [15]

    In: CRYPTO (2016)

    Brakerski, Z., Perlman, R.: Lattice-based fully dynamic multi-key FHE with short ciphertexts. In: CRYPTO (2016)

  16. [16]

    In: ACNS (2021)

    Chen, H., Dai, W., Kim, M., Song, Y.: Homomorphic conversion between (ring) LWE ciphertexts. In: ACNS (2021)

  17. [17]

    IEEE Trans

    Chen, J., Yang, L., Wu, W., Liu, Y., Feng, Y.: Homomorphic matrix operations under bicyclic encoding. IEEE Trans. Inf. Forensics Secur. (2024)

  18. [18]

    In: EUROCRYPT (2018)

    Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: Bootstrapping for approximate homomorphic encryption. In: EUROCRYPT (2018)

  19. [19]

    In: ASIACRYPT (2017)

    Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: ASIACRYPT (2017)

  20. [20]

    In: ASIACRYPT (2017)

    Chillotti, I., Gama, N., Georgieva, M., Izabach` ene, M.: Faster packed homomorphic operations and efficient circuit bootstrapping for TFHE. In: ASIACRYPT (2017)

  21. [21]

    CryptoLab: HEaaN library (2022), available at https://www.cryptolab.co.kr/en/products-en/heaan-he/ 42

  22. [22]

    In: NAACL-HLT (2019)

    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

  23. [23]

    Ding, Y., Guo, H., Guan, Y., Liu, W., Huo, J., Guan, Z., Zhang, X.: East: Efficient and accurate secure transformer framework for inference (2023), available at https://arxiv.org/abs/2308.09923

  24. [24]

    ACM Trans

    Dumas, J.G., Giorgi, P., Pernet, C.: Dense linear algebra over word-size prime fields: the FFLAS and FFPACK packages. ACM Trans. Math. Softw. (2008)

  25. [25]

    Available at http://eprint.iacr

    Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Available at http://eprint.iacr. org/2012/144 (2012)

  26. [26]

    (2014), http://linalg.org/projects/fflas-ffpack

    FFLAS14, T.F.F.G.: FFLAS-FFPACK: Finite Field Linear Algebra Subroutines / Package, v2.0.0 edn. (2014), http://linalg.org/projects/fflas-ffpack

  27. [27]

    In: IEEE S&P (2023)

    Froelicher, D., Cho, H., Edupalli, M., Sousa, J.S., Bossuat, J.P., Pyrgelis, A., Troncoso-Pastoriza, J.R., Berger, B., Hubaux, J.P.: Scalable and privacy-preserving federated principal component analysis. In: IEEE S&P (2023)

  28. [28]

    The Journal of Supercomputing (2024)

    Gao, Y., Quan, G., Homsi, S., Wen, W., Wang, L.: Secure and efficient general matrix multiplication on cloud using homomorphic encryption. The Journal of Supercomputing (2024)

  29. [29]

    In: CRYPTO (2012)

    Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the aes circuit. In: CRYPTO (2012)

  30. [30]

    In: CRYPTO (2013)

    Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: CRYPTO (2013)

  31. [31]

    In: EUROCRYPT (2010)

    Gentry, C., Halevi, S., Vaikuntanathan, V.: A simple BGN-type cryptosystem from LWE. In: EUROCRYPT (2010)

  32. [32]

    In: ICS (2010)

    Goldwasser, S., Kalai, Y.T., Peikert, C., Vaikuntanathan, V.: Robustness of the learning with errors assumption. In: ICS (2010)

  33. [33]

    arXiv preprint arXiv:2312.14250 (2023)

    G¨ unther, M., Sch¨ utze, L., Becher, K., Strufe, T., Castrillon, J.: Helium: A language and compiler for fully homomorphic encryption with support for proxy re-encryption. arXiv preprint arXiv:2312.14250 (2023)

  34. [34]

    In: CRYPTO (2014)

    Halevi, S., Shoup, V.: Algorithms in HElib. In: CRYPTO (2014)

  35. [35]

    In: CRYPTO (2018)

    Halevi, S., Shoup, V.: Faster homomorphic linear transformations in HElib. In: CRYPTO (2018)

  36. [36]

    In: NIPS (2022)

    Hao, M., Li, H., Chen, H., Xing, P., Xu, G., Zhang, T.: Iron: Private inference on transformers. In: NIPS (2022)

  37. [37]

    In: CCS (2024)

    He, J., Yang, K., Tang, G., Huang, Z., Lin, L., Wei, C., Yan, Y., Wang, W.: Rhombus: Fast homomorphic matrix-vector multiplication for secure two-party inference. In: CCS (2024)

  38. [38]

    In: SOSP (2023)

    Henzinger, A., Dauterman, E., Corrigan-Gibbs, H., Zeldovich, N.: Private web search with tiptoe. In: SOSP (2023)

  39. [39]

    Hong, S., Choi, Y.A., Joo, D.S., G¨ ursoy, G.: Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data. J. Biomed. Informs. (2024)

  40. [40]

    IEEE Trans

    Huang, Z., Hong, C., Weng, C., Lu, W., Qu, H.: More efficient secure matrix multiplication for unbalanced recommender systems. IEEE Trans. Dependable Secure Comput. (2021)

  41. [41]

    In: CCS (2018)

    Jiang, X., Kim, M., Lauter, K., Song, Y.: Secure outsourced matrix computation and application to neural networks. In: CCS (2018)

  42. [42]

    Langlois, A., Stehl´ e, D.: Worst-case to average-case reductions for module lattices. Des. Codes Cryptogr. (2015)

  43. [43]

    In: ASIACRYPT (2023)

    Lee, J.W., Lee, E., Kim, Y.S., No, J.S.: Rotation key reduction for client-server systems of deep neural network on fully homomorphic encryption. In: ASIACRYPT (2023)

  44. [44]

    IEEE Trans

    Liu, J., Zhang, L.F.: Privacy-preserving and publicly verifiable matrix multiplication. IEEE Trans. Services Comput. (2022)

  45. [45]

    In: EUROCRYPT (2010)

    Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: EUROCRYPT (2010)

  46. [46]

    Ma, X., Ma, C., Jiang, Y., Ge, C.: Improved privacy-preserving PCA using optimized homomorphic matrix multiplication. Comput. Secur. (2024)

  47. [47]

    In: PETS (2021)

    Mouchet, C., Troncoso-Pastoriza, J., Bossuat, J.P., Hubaux, J.P.: Multiparty homomorphic encryption from ring-learning-with-errors. In: PETS (2021)

  48. [48]

    In: IEEE S&P (2024)

    Pang, Q., Zhu, J., M¨ ollering, H., Zheng, W., Schneider, T.: BOLT: Privacy-preserving, accurate and efficient inference for transformers. In: IEEE S&P (2024)

  49. [49]

    In: EUROCRYPT (2025)

    Park, J.H.: Ciphertext-ciphertext matrix multiplication: Fast for large matrices. In: EUROCRYPT (2025)

  50. [50]

    In: STOC (2008)

    Peikert, C., Waters, B.: Lossy trapdoor functions and their applications. In: STOC (2008)

  51. [51]

    Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre- training (2018), available at https://openai.com/research/language-unsupervised

  52. [52]

    In: DAC (2023) 43

    Ren, X., Chen, Z., Gu, Z., Lu, Y., Zhong, R., Lu, W., Zhang, J., Zhang, Y., Wu, H., Zheng, X., Liu, H., Chu, T., Hong, C., Wei, C., Niu, D., Xie, Y.: CHAM: A customized homomorphic encryption accelerator for fast matrix-vector product. In: DAC (2023) 43

  53. [53]

    In: ASIACRYPT (2009)

    Stehl´ e, D., Steinfeld, R., Tanaka, K., Xagawa, K.: Efficient public key encryption based on ideal lattices. In: ASIACRYPT (2009)

  54. [54]

    The FLINT team: FLINT: Fast Library for Number Theory (2023), version 3.0.0, https://flintlib.org

  55. [55]

    Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., Lample, G.: LLaMA: Open and efficient foundation language models (2023), available at https://arxiv.org/abs/2302.13971

  56. [56]

    Zhang, J., Liu, J., Yang, X., Wang, Y., Chen, K., Hou, X., Ren, K., Yang, X.: Secure transformer inference made non-interactive (2023), available at https://eprint.iacr.org/2024/136

  57. [57]

    Zheng, X., Li, H., Wang, D.: A new framework for fast homomorphic matrix multiplication (2023), available at https://eprint.iacr.org/2023/1649

  58. [58]

    In: MSN (2023) 44

    Zhu, W., Li, X.: Secure mutual learning with low interactions for deep model training. In: MSN (2023) 44