Fast Homomorphic Linear Algebra with BLAS
Pith reviewed 2026-05-22 23:21 UTC · model grok-4.3
The pith
CKKS homomorphic matrix operations reduce directly to plaintext BLAS routines with a 4-12x slowdown for square matrix multiplication.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We provide reductions for matrix-vector products, vector-vector products for moderate-sized to large matrices to their plaintext equivalents. Combined with BLAS, we demonstrate that the efficiency loss between CKKS-based encrypted square matrix multiplication and double-precision floating-point square matrix multiplication is a mere 4-12 factor, depending on the precise situation.
What carries the argument
The explicit reductions that translate each homomorphic matrix operation into one or more calls to standard plaintext linear-algebra routines while preserving correctness.
If this is right
- Encrypted matrix multiplication becomes feasible at scales where plaintext BLAS already runs efficiently.
- Existing high-performance linear-algebra code bases can be reused for homomorphic workloads with only a small constant-factor penalty.
- Applications that need many matrix-vector or dot-product steps can switch to the encrypted setting without rewriting their core loops.
Where Pith is reading between the lines
- If the reductions compose cleanly, the same technique might apply to higher-level algorithms such as iterative solvers or neural-network layers that are built from these primitives.
- Hardware vendors optimizing BLAS could indirectly accelerate encrypted computation by the same factor without any changes to the encryption library.
- The approach separates the cryptographic noise management from the arithmetic scheduling, which may simplify porting to new hardware accelerators.
Load-bearing premise
The reductions from homomorphic CKKS operations to plaintext linear algebra preserve both correctness and the claimed efficiency without hidden overheads from encoding, noise management, or memory access patterns that are not captured by the high-level mapping.
What would settle it
A side-by-side timing measurement, on the same hardware, of a direct CKKS matrix-multiplication implementation versus the same computation after applying the paper's reductions and calling BLAS, for matrix sizes between 256 and 4096.
Figures
read the original abstract
Homomorphic encryption is a cryptographic paradigm allowing to compute on encrypted data, opening a wide range of applications in privacy-preserving data manipulation, notably in AI. Many of those applications require significant linear algebra computations (matrix-vector products, and matrix-matrix products). This central role of linear algebra computations goes far beyond homomorphic algebra and applies to most areas of scientific computing. This high versatility led, over time, to the development of a set of highly optimized routines, specified in 1979 under the name BLAS (basic linear algebra subroutines). Motivated both by the applicative importance of homomorphic linear algebra and the access to highly efficient implementations of cleartext linear algebra able to draw the most out of available hardware, we explore the connections between CKKS-based homomorphic linear algebra and floating-point plaintext linear algebra. The CKKS homomorphic encryption system is the most natural choice in this setting, as it natively handles real numbers and offers a large SIMD parallelism. We provide reductions for matrix-vector products, vector-vector products for moderate-sized to large matrices to their plaintext equivalents. Combined with BLAS, we demonstrate that the efficiency loss between CKKS-based encrypted square matrix multiplication and double-precision floating-point square matrix multiplication is a mere 4-12 factor, depending on the precise situation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish reductions mapping CKKS homomorphic matrix-vector and vector-vector products (for moderate-to-large matrices) onto equivalent plaintext linear-algebra operations. When these reductions are combined with existing BLAS libraries, the authors report that the slowdown of CKKS-based encrypted square matrix multiplication relative to double-precision plaintext multiplication is only a factor of 4-12, depending on the precise setting.
Significance. If the reductions are shown to preserve both correctness and the claimed performance without unaccounted overheads from rotations, key-switching or encoding, the result would be significant: it would demonstrate that homomorphic linear algebra can directly exploit decades of hardware-specific optimization in plaintext BLAS, substantially lowering the barrier to practical privacy-preserving linear-algebra workloads in machine learning and scientific computing.
major comments (2)
- [Abstract] Abstract: the central performance claim (4-12× slowdown) rests on the assertion that the listed homomorphic operations reduce to plaintext BLAS calls. The manuscript provides no explicit accounting of the cost of ciphertext rotations, key-switching, or noise-management steps that are required by CKKS but are not obviously expressible as BLAS primitives; if these steps dominate the runtime, the reported factor understates the true overhead.
- [Experimental evaluation] The experimental section does not report matrix dimensions, number of repetitions, or whether timings include encoding/decoding and rotation-key application. Without these details it is impossible to verify that the measured 4-12× ratio isolates the linear-algebra kernel as claimed.
minor comments (2)
- Notation for ciphertext packing and SIMD slot alignment is introduced without a dedicated preliminary section, making the reduction arguments harder to follow for readers outside the immediate subfield.
- [Abstract] The abstract refers to “moderate-sized to large matrices” without quantitative bounds; a short table or sentence giving the range of n for which the reductions apply would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments. We address the major comments below and will revise the manuscript accordingly to improve clarity on performance accounting and experimental details.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim (4-12× slowdown) rests on the assertion that the listed homomorphic operations reduce to plaintext BLAS calls. The manuscript provides no explicit accounting of the cost of ciphertext rotations, key-switching, or noise-management steps that are required by CKKS but are not obviously expressible as BLAS primitives; if these steps dominate the runtime, the reported factor understates the true overhead.
Authors: Our reductions are constructed to express the dominant costs of CKKS matrix-vector and matrix-matrix multiplication as sequences of plaintext BLAS operations. The auxiliary costs of rotations, key-switching, and noise management are either integrated into the reduction (via batched plaintext equivalents) or captured in the end-to-end timings that produce the reported 4-12× factor. To make this accounting explicit, we will add a dedicated subsection with a cost breakdown of each CKKS primitive relative to the BLAS calls. revision: yes
-
Referee: [Experimental evaluation] The experimental section does not report matrix dimensions, number of repetitions, or whether timings include encoding/decoding and rotation-key application. Without these details it is impossible to verify that the measured 4-12× ratio isolates the linear-algebra kernel as claimed.
Authors: We agree that these experimental parameters must be stated explicitly for reproducibility. The revised manuscript will report the exact matrix dimensions evaluated, the number of repetitions used to obtain stable timings, and confirm that all measurements include encoding, decoding, and rotation-key application steps. This will demonstrate that the 4-12× ratio reflects the complete homomorphic linear-algebra workload. revision: yes
Circularity Check
No circularity; reductions map to independent external BLAS implementations
full rationale
The paper's derivation consists of explicit algorithmic reductions that map CKKS matrix-vector and vector-vector products onto plaintext equivalents, then invokes independently developed and benchmarked BLAS libraries for the plaintext side. The reported 4-12x slowdown factors are obtained by direct timing comparisons against double-precision floating-point matrix multiplication rather than by any quantity defined in terms of the homomorphic result itself. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, or ansatzes smuggled via prior work appear in the abstract or described chain; the central efficiency claim therefore remains self-contained against external plaintext benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CKKS natively supports real-number arithmetic and SIMD-style packing of multiple values into ciphertexts
- standard math BLAS libraries deliver near-optimal performance for plaintext floating-point matrix operations on available hardware
Reference graph
Works this paper leans on
-
[1]
OpenBLAS: An optimized BLAS library – version 0.3.26, available at https://www.openblas.net/
-
[2]
754-2019 - ieee standard for floating-point arithmetic (2019)
work page 2019
-
[3]
Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. (2015), software available at https://github.com/malb/lattice-estimator
work page 2015
-
[4]
Ali, A., Lepoint, T., Patel, S., Raykova, M., Schoppmann, P., Seth, K., Yeo, K.: Communication-computation trade-offs in PIR. In: USENIX (2021)
work page 2021
-
[5]
Alman, J., Duan, R., Williams, V.V., Xu, Y., Xu, Z., Zhou, R.: More asymmetry yields faster matrix multiplica- tion. In: SODA (2025)
work page 2025
-
[6]
Angel, S., Chen, H., Laine, K., Setty, S.T.V.: PIR with compressed queries and amortized query processing. In: IEEE S&P (2018)
work page 2018
-
[7]
Bae, Y., Cheon, J.H., Hanrot, G., Park, J.H., Stehl´ e, D.: Plaintext-ciphertext matrix multiplication and FHE bootstrapping: Fast and fused. In: CRYPTO (2024)
work page 2024
-
[8]
Bae, Y., Cheon, J.H., Kim, J., Park, J.H., Stehl´ e, D.: HERMES: efficient ring packing using MLWE ciphertexts and application to transciphering. In: CRYPTO (2023)
work page 2023
-
[9]
Bajard, J.C., Eynard, J., Hasan, M.A., Zucca, V.: A full RNS variant of FV like somewhat homomorphic encryp- tion schemes. In: SAC (2016)
work page 2016
-
[10]
Boneh, D., Goh, E.J., Nissim, K.: Evaluating 2-DNF formulas on ciphertexts. In: TCC (2005)
work page 2005
-
[11]
Boura, C., Gama, N., Georgieva, M., Jetchev, D.: CHIMERA: combining Ring-LWE-based fully homomorphic encryption schemes. J. Math. Cryptol. (2020)
work page 2020
-
[12]
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical gapSVP. In: CRYPTO (2012)
work page 2012
- [13]
-
[14]
Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehl´ e, D.: Classical hardness of learning with errors. In: STOC (2013)
work page 2013
-
[15]
Brakerski, Z., Perlman, R.: Lattice-based fully dynamic multi-key FHE with short ciphertexts. In: CRYPTO (2016)
work page 2016
-
[16]
Chen, H., Dai, W., Kim, M., Song, Y.: Homomorphic conversion between (ring) LWE ciphertexts. In: ACNS (2021)
work page 2021
-
[17]
Chen, J., Yang, L., Wu, W., Liu, Y., Feng, Y.: Homomorphic matrix operations under bicyclic encoding. IEEE Trans. Inf. Forensics Secur. (2024)
work page 2024
-
[18]
Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: Bootstrapping for approximate homomorphic encryption. In: EUROCRYPT (2018)
work page 2018
-
[19]
Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: ASIACRYPT (2017)
work page 2017
-
[20]
Chillotti, I., Gama, N., Georgieva, M., Izabach` ene, M.: Faster packed homomorphic operations and efficient circuit bootstrapping for TFHE. In: ASIACRYPT (2017)
work page 2017
-
[21]
CryptoLab: HEaaN library (2022), available at https://www.cryptolab.co.kr/en/products-en/heaan-he/ 42
work page 2022
-
[22]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
work page 2019
- [23]
- [24]
-
[25]
Available at http://eprint.iacr
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Available at http://eprint.iacr. org/2012/144 (2012)
work page 2012
-
[26]
(2014), http://linalg.org/projects/fflas-ffpack
FFLAS14, T.F.F.G.: FFLAS-FFPACK: Finite Field Linear Algebra Subroutines / Package, v2.0.0 edn. (2014), http://linalg.org/projects/fflas-ffpack
work page 2014
-
[27]
Froelicher, D., Cho, H., Edupalli, M., Sousa, J.S., Bossuat, J.P., Pyrgelis, A., Troncoso-Pastoriza, J.R., Berger, B., Hubaux, J.P.: Scalable and privacy-preserving federated principal component analysis. In: IEEE S&P (2023)
work page 2023
-
[28]
The Journal of Supercomputing (2024)
Gao, Y., Quan, G., Homsi, S., Wen, W., Wang, L.: Secure and efficient general matrix multiplication on cloud using homomorphic encryption. The Journal of Supercomputing (2024)
work page 2024
-
[29]
Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the aes circuit. In: CRYPTO (2012)
work page 2012
-
[30]
Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: CRYPTO (2013)
work page 2013
-
[31]
Gentry, C., Halevi, S., Vaikuntanathan, V.: A simple BGN-type cryptosystem from LWE. In: EUROCRYPT (2010)
work page 2010
-
[32]
Goldwasser, S., Kalai, Y.T., Peikert, C., Vaikuntanathan, V.: Robustness of the learning with errors assumption. In: ICS (2010)
work page 2010
-
[33]
arXiv preprint arXiv:2312.14250 (2023)
G¨ unther, M., Sch¨ utze, L., Becher, K., Strufe, T., Castrillon, J.: Helium: A language and compiler for fully homomorphic encryption with support for proxy re-encryption. arXiv preprint arXiv:2312.14250 (2023)
- [34]
-
[35]
Halevi, S., Shoup, V.: Faster homomorphic linear transformations in HElib. In: CRYPTO (2018)
work page 2018
-
[36]
Hao, M., Li, H., Chen, H., Xing, P., Xu, G., Zhang, T.: Iron: Private inference on transformers. In: NIPS (2022)
work page 2022
-
[37]
He, J., Yang, K., Tang, G., Huang, Z., Lin, L., Wei, C., Yan, Y., Wang, W.: Rhombus: Fast homomorphic matrix-vector multiplication for secure two-party inference. In: CCS (2024)
work page 2024
-
[38]
Henzinger, A., Dauterman, E., Corrigan-Gibbs, H., Zeldovich, N.: Private web search with tiptoe. In: SOSP (2023)
work page 2023
-
[39]
Hong, S., Choi, Y.A., Joo, D.S., G¨ ursoy, G.: Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data. J. Biomed. Informs. (2024)
work page 2024
-
[40]
Huang, Z., Hong, C., Weng, C., Lu, W., Qu, H.: More efficient secure matrix multiplication for unbalanced recommender systems. IEEE Trans. Dependable Secure Comput. (2021)
work page 2021
-
[41]
Jiang, X., Kim, M., Lauter, K., Song, Y.: Secure outsourced matrix computation and application to neural networks. In: CCS (2018)
work page 2018
-
[42]
Langlois, A., Stehl´ e, D.: Worst-case to average-case reductions for module lattices. Des. Codes Cryptogr. (2015)
work page 2015
-
[43]
Lee, J.W., Lee, E., Kim, Y.S., No, J.S.: Rotation key reduction for client-server systems of deep neural network on fully homomorphic encryption. In: ASIACRYPT (2023)
work page 2023
-
[44]
Liu, J., Zhang, L.F.: Privacy-preserving and publicly verifiable matrix multiplication. IEEE Trans. Services Comput. (2022)
work page 2022
-
[45]
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: EUROCRYPT (2010)
work page 2010
-
[46]
Ma, X., Ma, C., Jiang, Y., Ge, C.: Improved privacy-preserving PCA using optimized homomorphic matrix multiplication. Comput. Secur. (2024)
work page 2024
-
[47]
Mouchet, C., Troncoso-Pastoriza, J., Bossuat, J.P., Hubaux, J.P.: Multiparty homomorphic encryption from ring-learning-with-errors. In: PETS (2021)
work page 2021
-
[48]
Pang, Q., Zhu, J., M¨ ollering, H., Zheng, W., Schneider, T.: BOLT: Privacy-preserving, accurate and efficient inference for transformers. In: IEEE S&P (2024)
work page 2024
-
[49]
Park, J.H.: Ciphertext-ciphertext matrix multiplication: Fast for large matrices. In: EUROCRYPT (2025)
work page 2025
-
[50]
Peikert, C., Waters, B.: Lossy trapdoor functions and their applications. In: STOC (2008)
work page 2008
-
[51]
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre- training (2018), available at https://openai.com/research/language-unsupervised
work page 2018
-
[52]
Ren, X., Chen, Z., Gu, Z., Lu, Y., Zhong, R., Lu, W., Zhang, J., Zhang, Y., Wu, H., Zheng, X., Liu, H., Chu, T., Hong, C., Wei, C., Niu, D., Xie, Y.: CHAM: A customized homomorphic encryption accelerator for fast matrix-vector product. In: DAC (2023) 43
work page 2023
-
[53]
Stehl´ e, D., Steinfeld, R., Tanaka, K., Xagawa, K.: Efficient public key encryption based on ideal lattices. In: ASIACRYPT (2009)
work page 2009
-
[54]
The FLINT team: FLINT: Fast Library for Number Theory (2023), version 3.0.0, https://flintlib.org
work page 2023
-
[55]
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., Lample, G.: LLaMA: Open and efficient foundation language models (2023), available at https://arxiv.org/abs/2302.13971
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[56]
Zhang, J., Liu, J., Yang, X., Wang, Y., Chen, K., Hou, X., Ren, K., Yang, X.: Secure transformer inference made non-interactive (2023), available at https://eprint.iacr.org/2024/136
work page 2023
-
[57]
Zheng, X., Li, H., Wang, D.: A new framework for fast homomorphic matrix multiplication (2023), available at https://eprint.iacr.org/2023/1649
work page 2023
-
[58]
Zhu, W., Li, X.: Secure mutual learning with low interactions for deep model training. In: MSN (2023) 44
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.