arxiv: 2604.09975 · v1 · submitted 2026-04-11 · 💻 cs.CR

Recognition: unknown

EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

Chao Jin, Khin Mi Mi Aung, Xiaokui Xiao, Yufan Zhu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:50 UTC · model grok-4.3

classification 💻 cs.CR

keywords secure inferencetransformer modelshomomorphic encryptionmultiparty computationprivacy-preserving MLBERTGPTencrypted data

0 comments

The pith

EncFormer reduces communication and latency for private transformer inference by aligning FHE kernels with stage compatible patterns and efficient MPC conversions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EncFormer as a two-party framework for running transformer models on encrypted user inputs without exposing data. It tackles bottlenecks in prior FHE-MPC hybrids by defining Stage Compatible Patterns that let FHE operations flow across model stages with less repacking and fewer switches. A cost model helps pick the right boundaries between encryption methods, while a new secure conversion handles complex numbers and lighter MPC protocols manage nonlinear steps. Evaluations on BERT and GPT models show large drops in communication and runtime compared to earlier systems, with accuracy staying close to normal on GLUE tasks. Readers would care because this makes privacy-preserving machine learning services more usable for real models.

Core claim

EncFormer is a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. It also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities, achieving 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems on GPT- and BERT-style models while maintaining near-plaintext accuracy on selected

What carries the argument

Stage Compatible Patterns, which align FHE operations across transformer stages to minimize repacking and costly switches to MPC.

Load-bearing premise

The Stage Compatible Patterns and secure complex CKKS-MPC conversion protocol preserve both security and numerical correctness when composed across the full transformer pipeline without introducing new side-channel or approximation vulnerabilities.

What would settle it

Running the full EncFormer pipeline on a GLUE task and measuring either a clear accuracy drop below near-plaintext levels or a detectable security leak in the composed FHE-MPC steps would show the claims do not hold.

Figures

Figures reproduced from arXiv: 2604.09975 by Chao Jin, Khin Mi Mi Aung, Xiaokui Xiao, Yufan Zhu.

**Figure 2.** Figure 2: Illustration of RNS–CKKS, showing the residue representation and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Reference Transformer architecture used in this work, highlighting [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Packing comparison for a 2×4 matrix with n = 4 slots per ciphertext. Minimal packing uses Kmin = 1 ciphertext, whereas expanded packing uses more than Kmin ciphertexts. conversion communication. Stage Compatible Patterns address both problems through three rules applied to a canonical packing family F = {segment-column, folded-diagonal, head-major}. Each FHE kernel declares which format it accepts and whic… view at source ↗

**Figure 5.** Figure 5: Toy example of the attention kernels with [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Architecture comparison for hybrid inference. (1) Prior layerwise [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: CKKS primitive latency vs. multiplicative depth for PhantomFHE and [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: GELU latency deltas between EncFormer and the minimal baseline, [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: (1) Plain matrix multiplication for Q = AWQ and K = AWK with A ∈ R2×4 and WQ, WK ∈ R4×4 . (2) Packed CKKS procedure with n = 4 slots using fused plaintext weights pre-permuted WQ + iWK, producing Q and K in one projection. 2) Key-Switching Complexity: Assuming a precomputed bank of Φ ∆ C with cost #rotbank = O(UC) where U = ⌈G/2⌉ as in Section 3.2, each Φ ∆ C is implemented by RotFirstCm and costs two rota… view at source ↗

read the original abstract

Transformer inference in machine-learning-as-a-service (MLaaS) raises privacy concerns for sensitive user inputs. Prior secure solutions that combine fully homomorphic encryption (FHE) and secure multiparty computation (MPC) are bottlenecked by inefficient FHE kernels, communication-heavy MPC protocols, and expensive FHE-MPC conversions. We present EncFormer, a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. EncFormer also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities. With GPU optimizations, evaluations on GPT- and BERT-style models show that EncFormer achieves 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems, and 1.9x-3.5x lower end-to-end latency on BERT-base than FHE-only pipelines under a matched backend, while maintaining near-plaintext accuracy on selected GLUE tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces EncFormer, a two-party secure inference framework for Transformers that combines fully homomorphic encryption (FHE) with secure multiparty computation (MPC). It proposes Stage Compatible Patterns to minimize FHE kernel repacking and conversions, a cost analysis model based on a minimal-conversion baseline for choosing FHE-MPC boundaries, a secure complex CKKS-MPC conversion protocol, and optimized MPC protocols for nonlinear operations. GPU-accelerated evaluations on GPT- and BERT-style models report 1.4x–30.4x reductions in online MPC communication and 1.3x–9.8x lower end-to-end latency versus prior hybrid FHE-MPC systems, plus 1.9x–3.5x latency improvement over FHE-only pipelines on BERT-base, with near-plaintext accuracy on selected GLUE tasks.

Significance. If the reported performance gains and accuracy preservation hold under rigorous verification, EncFormer would mark a meaningful step toward practical private inference for large language models. The Stage Compatible Patterns and minimal-conversion cost model offer reusable design principles that could reduce the overhead of hybrid secure computation pipelines in deep learning. The work addresses real bottlenecks in FHE-MPC conversions and nonlinearities, potentially enabling more efficient MLaaS deployments while preserving privacy.

major comments (3)

[§4.3] §4.3 (secure complex CKKS-MPC conversion protocol): the security and correctness arguments do not supply per-stage or end-to-end approximation-error bounds for the composition of Stage Compatible Patterns across attention, FFN, and layer-norm stages; without these bounds it is impossible to confirm that cumulative CKKS noise remains below the threshold that would alter the reported GLUE accuracy.
[Table 3 / §5.2] Table 3 / §5.2 (evaluation results): the latency and communication speedups (1.3x–9.8x and 1.4x–30.4x) are presented as point estimates without error bars, run counts, or a per-component breakdown that isolates the contribution of the new patterns versus GPU optimizations and the minimal-conversion baseline.
[§5.1] §5.1 (cost analysis model): the claim that the minimal-conversion baseline enables principled FHE-MPC boundary selection is not accompanied by a sensitivity analysis showing how small changes in the assumed noise growth or repacking cost alter the recommended boundaries for GPT- and BERT-scale models.

minor comments (2)

Notation for the Stage Compatible Patterns is introduced without a compact summary table relating pattern names to the FHE kernels they enable; a single table would improve readability.
The abstract states “near-plaintext accuracy on selected GLUE tasks” but the main text does not list the exact tasks or the numerical accuracy deltas; adding this information would strengthen the accuracy claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment point by point below, indicating the revisions we will make to improve the manuscript's rigor and clarity.

read point-by-point responses

Referee: [§4.3] §4.3 (secure complex CKKS-MPC conversion protocol): the security and correctness arguments do not supply per-stage or end-to-end approximation-error bounds for the composition of Stage Compatible Patterns across attention, FFN, and layer-norm stages; without these bounds it is impossible to confirm that cumulative CKKS noise remains below the threshold that would alter the reported GLUE accuracy.

Authors: We acknowledge that explicit per-stage and end-to-end approximation-error bounds would strengthen the security and correctness arguments. The manuscript currently relies on empirical evidence of near-plaintext GLUE accuracy to indicate that cumulative noise does not affect results. In the revised version, we will add analytical noise-growth bounds for each Stage Compatible Pattern and their composition across attention, FFN, and layer-norm stages to §4.3, along with a supporting appendix deriving the end-to-end bounds from the chosen CKKS parameters. This will explicitly confirm that noise remains below the accuracy-altering threshold. revision: yes
Referee: [Table 3 / §5.2] Table 3 / §5.2 (evaluation results): the latency and communication speedups (1.3x–9.8x and 1.4x–30.4x) are presented as point estimates without error bars, run counts, or a per-component breakdown that isolates the contribution of the new patterns versus GPU optimizations and the minimal-conversion baseline.

Authors: We agree that variability measures and component breakdowns would improve reproducibility and interpretability. The reported numbers reflect measurements from our primary GPU-based experimental runs. We will revise §5.2 and Table 3 to report averages over multiple runs (with standard deviations as error bars) and add a per-component breakdown (via an additional table or figure) that isolates the contributions of Stage Compatible Patterns, the secure CKKS-MPC conversion protocol, GPU optimizations, and the minimal-conversion baseline. revision: yes
Referee: [§5.1] §5.1 (cost analysis model): the claim that the minimal-conversion baseline enables principled FHE-MPC boundary selection is not accompanied by a sensitivity analysis showing how small changes in the assumed noise growth or repacking cost alter the recommended boundaries for GPT- and BERT-scale models.

Authors: The cost model in §5.1 uses the minimal-conversion baseline to guide boundary selection based on measured costs. To address the request for sensitivity analysis, we will revise §5.1 to include a brief sensitivity study examining how the recommended FHE-MPC boundaries for BERT-base and GPT-style models change under small variations (±20% in noise growth and ±30% in repacking cost). This will demonstrate the stability of the chosen boundaries within realistic parameter ranges. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on empirical evaluation of new protocols

full rationale

The paper's central results derive from the introduction of Stage Compatible Patterns, a minimal-conversion cost model, a secure complex CKKS-MPC conversion protocol, and GPU-optimized MPC protocols for nonlinearities. These are then evaluated directly on GPT- and BERT-style models for communication volume, latency, and GLUE accuracy. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or definitional tautology; the reported speedups (1.4x-30.4x, 1.3x-9.8x, 1.9x-3.5x) and accuracy preservation are measured outcomes, not algebraic consequences of the input assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework relies on standard cryptographic primitives and two-party security assumptions rather than new free parameters or invented entities.

axioms (2)

domain assumption Security of the CKKS fully homomorphic encryption scheme under standard hardness assumptions
Invoked for all encrypted linear operations and conversions.
domain assumption Honest-but-curious two-party threat model for the MPC protocols
Standard assumption for secure inference frameworks.

pith-pipeline@v0.9.0 · 5522 in / 1302 out tokens · 62403 ms · 2026-05-10T16:50:47.991674+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 11 canonical work pages

[1]

BERT: Pre- training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Co...

2019
[2]

Language models are unsupervised multitask learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019, technical report, OpenAI. [Online]. Available: https://api.semanticscholar.org/ CorpusID:160025533

2019
[3]

arXiv preprint arXiv:1904.05342 , year =

K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling clinical notes and predicting hospital readmission,”ArXiv, vol. abs/1904.05342, 2019. [Online]. Available: https://api.semanticscholar. org/CorpusID:119308351

work page arXiv 1904
[4]

LEGAL-BERT: The muppets straight out of law school,

I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, “LEGAL-BERT: The muppets straight out of law school,” inFindings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 2898–2904. [Online]. Available: https://aclant...

2020
[5]

BioBERT: a pre-trained biomedical language representation model for biomedical text mining,

J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “BioBERT: a pre-trained biomedical language representation model for biomedical text mining,”Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 09 2019. [Online]. Available: https://doi.org/10.1093/bioinformatics/ btz682

work page doi:10.1093/bioinformatics/ 2019
[6]

Finchain-BERT: A high-accuracy automatic fraud detection model based on NLP methods for financial scenarios,

X. Yang, C. Zhang, Y . Sun, K. Pang, L. Jing, S. Wa, and C. Lv, “Finchain-BERT: A high-accuracy automatic fraud detection model based on NLP methods for financial scenarios,”Information, vol. 14, no. 9, 2023. [Online]. Available: https://www.mdpi.com/2078-2489/14/ 9/499

2023
[7]

Membership inference attacks against machine learning models,

R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 3–18

2017
[8]

BOLT: privacy-preserving, accurate and efficient inference for transformers,

Q. Pang, J. Zhu, H. M ¨ollering, W. Zheng, and T. Schneider, “BOLT: privacy-preserving, accurate and efficient inference for transformers,” in IEEE Symposium on Security and Privacy (SP 2024). San Francisco, CA, USA: IEEE, May 2024, pp. 4753–4771

2024
[9]

Shaft: Secure, handy, accurate, and fast transformer inference,

A. Y . L. Kei and S. S. M. Chow, “Shaft: Secure, handy, accurate, and fast transformer inference,” inNetwork and Distributed System Security Symposium (NDSS), 2025

2025
[10]

Breaking the layer barrier: remodeling private transformer inference with hybrid CKKS and MPC,

T. Xu, W.-j. Lu, J. Yu, Y . Chen, C. Lin, R. Wang, and M. Li, “Breaking the layer barrier: remodeling private transformer inference with hybrid CKKS and MPC,” inProceedings of the 34th USENIX Conference on Security Symposium. USA: USENIX Association, 2025

2025
[11]

Thor: Secure transformer inference with homomorphic encryption,

J. Moon, D. Yoo, X. Jiang, and M. Kim, “Thor: Secure transformer inference with homomorphic encryption,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS ’25). Association for Computing Machinery, 2025, pp. 392–410

2025
[12]

Powerformer: Efficient and high-accuracy privacy-preserving language model with homomorphic encryption,

D. Park, E. Lee, and J.-W. Lee, “Powerformer: Efficient and high-accuracy privacy-preserving language model with homomorphic encryption,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vienna, Austria: Association for Computational Linguistics, Jul. 2025, pp. 11 090–11 111. [Online]. Avai...

2025
[13]

THE-X: Privacy-preserving transformer inference with homomorphic encryption,

T. Chen, H. Bao, S. Huang, L. Dong, B. Jiao, D. Jiang, H. Zhou, J. Li, and F. Wei, “THE-X: Privacy-preserving transformer inference with homomorphic encryption,” inFindings of the Association for Computational Linguistics: ACL 2022, S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 3...

2022
[14]

SecFormer: Fast and accurate privacy-preserving inference for transformer models via SMPC,

J. Luo, Y . Zhang, Z. Zhang, J. Zhang, X. Mu, H. Wang, Y . Yu, and Z. Xu, “SecFormer: Fast and accurate privacy-preserving inference for transformer models via SMPC,” inFindings of the Association for Computational Linguistics (ACL 2024), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, ...

2024
[15]

Iron: Private inference on transformers,

M. Hao, H. Li, H. Chen, P. Xing, G. Xu, and T. Zhang, “Iron: Private inference on transformers,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 15 718–15 731. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2022...

2022
[16]

BumbleBee: Secure two-party inference framework for large transformers,

W.-j. Lu, Z. Huang, Z. Gu, J. Li, J. Liu, C. Hong, K. Ren, T. Wei, and W. Chen, “BumbleBee: Secure two-party inference framework for large transformers,” inNetwork and Distributed System Security (NDSS) Symposium 2025. San Diego, CA, USA: Internet Society, Feb. 2025. [Online]. Available: https://doi.org/10.14722/ndss.2025.230057

work page doi:10.14722/ndss.2025.230057 2025
[17]

Somewhat practical fully homomorphic encryption,

J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,”IACR Cryptology ePrint Archive, vol. 2012, p. 144, 2012. [Online]. Available: https://eprint.iacr.org/2012/144

2012
[18]

Leveled fully homo- morphic encryption without bootstrapping,

Z. Brakerski, C. Gentry, and V . Vaikuntanathan, “Leveled fully homo- morphic encryption without bootstrapping,” inProceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 2012, pp. 309–325

2012
[19]

Homomorphic encryption for arithmetic of approximate numbers,

J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inAdvances in Cryptology – ASIACRYPT 2017, ser. Lecture Notes in Computer Science, vol. 10624. Springer, 2017, pp. 409–437

2017
[20]

Phantom: A CUDA-accelerated word-wise homomorphic encryption library,

H. Yang, S. Shen, W. Dai, L. Zhou, Z. Liu, and Y . Zhao, “Phantom: A CUDA-accelerated word-wise homomorphic encryption library,”IEEE Trans. Dependable Secur. Comput., vol. 21, no. 5, pp. 4895–4906,
[21]

Available: https://doi.org/10.1109/TDSC.2024.3363900

[Online]. Available: https://doi.org/10.1109/TDSC.2024.3363900

work page doi:10.1109/tdsc.2024.3363900 2024
[22]

Liberate.FHE: A New FHE Library for Bridging the Gap be- tween Theory and Practice with a Focus on Performance and Accuracy,

DESILO, “Liberate.FHE: A New FHE Library for Bridging the Gap be- tween Theory and Practice with a Focus on Performance and Accuracy,” https://github.com/Desilo/liberate-fhe, 2023, accessed 2025-09-17

2023
[23]

Microsoft SEAL (release 4.1),

“Microsoft SEAL (release 4.1),” https://github.com/Microsoft/SEAL, Jan. 2023, microsoft Research, Redmond, W A

2023
[24]

Primer: Fast private transformer infer- ence on encrypted data,

M. Zheng, Q. Lou, and L. Jiang, “Primer: Fast private transformer infer- ence on encrypted data,” in2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6

2023
[25]

Implementing gentry’s fully-homomorphic encryption scheme,

C. Gentry and S. Halevi, “Implementing gentry’s fully-homomorphic encryption scheme,” Cryptology ePrint Archive, Paper 2010/520, 2010. [Online]. Available: https://eprint.iacr.org/2010/520

2010
[26]

Algorithms in helib,

S. Halevi and V . Shoup, “Algorithms in helib,” inAdvances in Cryptol- ogy – CRYPTO 2014. Springer, 2014, pp. 554–571

2014
[27]

Efficient multiparty protocols using circuit randomization,

D. Beaver, “Efficient multiparty protocols using circuit randomization,” inAdvances in Cryptology — CRYPTO ’91, J. Feigenbaum, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 1992, pp. 420–432

1992
[28]

Multiparty computation from somewhat homomorphic encryption,

I. Damg ˚ard, V . Pastro, N. P. Smart, and S. Zakarias, “Multiparty computation from somewhat homomorphic encryption,” inAdvances in Cryptology – CRYPTO 2012, ser. Lecture Notes in Computer Science, vol. 7417. Springer, 2012, pp. 643–662

2012
[29]

How to generate and exchange secrets,

A. C.-C. Yao, “How to generate and exchange secrets,” in27th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 1986, pp. 162–167

1986
[30]

How to play any mental game,

O. Goldreich, S. Micali, and A. Wigderson, “How to play any mental game,” inProceedings of the 19th Annual ACM Symposium on Theory of Computing (STOC). ACM, 1987, pp. 218–229

1987
[31]

Secure transformer inference made non-interactive,

J. Zhang, X. Yang, L. He, K. Chen, W. jie Lu, Y . Wang, X. Hou, J. Liu, K. Ren, and X. Yang, “Secure transformer inference made non-interactive,” in32nd Annual Network and Distributed System Security Symposium (NDSS 2025). The Internet Society, 2025. [Online]. Available: https://www.ndss-symposium.org/ ndss-paper/secure-transformer-inference-made-non-interactive/

2025
[32]

Cheetah: Lean and fast secure Two-Party deep neural network inference,

Z. Huang, W. jie Lu, C. Hong, and J. Ding, “Cheetah: Lean and fast secure Two-Party deep neural network inference,” in31st USENIX Security Symposium (USENIX Security 22). Boston, MA: USENIX Association, Aug. 2022, pp. 809–826. [Online]. Available: https://www. usenix.org/conference/usenixsecurity22/presentation/huang-zhicong

2022
[33]

GAZELLE: A low latency framework for secure neural network inference,

C. Juvekar, V . Vaikuntanathan, and A. Chandrakasan, “GAZELLE: A low latency framework for secure neural network inference,” in27th USENIX Security Symposium (USENIX Security 18). Baltimore, MD: USENIX Association, Aug. 2018, pp. 1651–1669. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity18/presentation/juvekar MANUSCRIPT SUBMITTED T...

2018
[34]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://pr...

2017
[35]

Secure outsourced matrix computation and application to neural networks,

X. Jiang, M. Kim, K. Lauter, and Y . Song, “Secure outsourced matrix computation and application to neural networks,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’18. New York, NY , USA: Association for Computing Machinery, 2018, pp. 1209–1222. [Online]. Available: https://doi.org/10.1145/3243734.3243837

work page doi:10.1145/3243734.3243837 2018
[36]

Privcirnet: efficient private inference via block circulant transformation,

T. Xu, L. Wu, R. Wang, and M. Li, “Privcirnet: efficient private inference via block circulant transformation,” inProceedings of the 38th International Conference on Neural Information Processing Systems, ser. NIPS ’24. Red Hook, NY , USA: Curran Associates Inc., 2024

2024
[37]

Mp2ml: A mixed-protocol machine learning framework for private inference,

F. Boemer, R. Cammarota, D. Demmler, T. Schneider, and H. Yalame, “Mp2ml: A mixed-protocol machine learning framework for private inference,” inProceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, ser. PPMLP’20. New York, NY , USA: Association for Computing Machinery, 2020, pp. 43–45. [Online]. Available: https://doi.org/10...

work page doi:10.1145/3411501.3419425 2020
[38]

EzPC: Programmable and efficient secure two-party computation for machine learning,

N. Chandran, D. Gupta, A. Rastogi, R. Sharma, and S. Tripathi, “EzPC: Programmable and efficient secure two-party computation for machine learning,” inIEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2019, pp. 496–511

2019
[39]

CrypTFlow2: Practical 2-party secure inference,

D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “CrypTFlow2: Practical 2-party secure inference,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2020, pp. 325–342

2020
[40]

Crypten: secure multi-party computation meets machine learning,

B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, M. Ibrahim, and L. van der Maaten, “Crypten: secure multi-party computation meets machine learning,” inProceedings of the 35th International Conference on Neural Information Processing Systems, ser. NIPS ’21. Red Hook, NY , USA: Curran Associates Inc., 2021

2021
[41]

GLUE: A multi-task benchmark and analysis platform for natural language understanding,

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” inProceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, T. Linzen, G. Chrupała, and A. Alishahi, Eds. Brussels, Belgium: Association for Computational Lin...

2018
[42]

Transformers,

Hugging Face, “Transformers,” https://huggingface.co/, 2026, accessed: 2026-01-26

2026
[43]

FIDESlib: A fully-fledged open-source FHE library for efficient CKKS on GPUs,

C. Agull ´o-Domingo, ´O. Vera-L ´opez, S. Guzelhan, L. Daksha, A. E. Jerari, K. Shivdikar, R. Agrawal, D. Kaeli, A. Joshi, and J. L. Abell ´an, “FIDESlib: A fully-fledged open-source FHE library for efficient CKKS on GPUs,” in2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2025, pp. 1–3

2025
[44]

Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,

R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,” inProceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New Yo...

2016
[45]

Chet: an optimizing compiler for fully-homomorphic neural-network inferencing,

R. Dathathri, O. Saarikivi, H. Chen, K. Laine, K. Lauter, S. Maleki, M. Musuvathi, and T. Mytkowicz, “Chet: an optimizing compiler for fully-homomorphic neural-network inferencing,” inProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2019. New York, NY , USA: Association for Computing Machinery, 20...

work page doi:10.1145/3314221.3314628 2019
[46]

Cipherformer: Efficient transformer private inference with low round complexity,

W. Wang and Y . Kuang, “Cipherformer: Efficient transformer private inference with low round complexity,” in2024 27th International Con- ference on Computer Supported Cooperative Work in Design (CSCWD), 2024, pp. 3054–3059

2024
[47]

Private transformer inference in mlaas: A survey,

Y . Li, X. Zhou, Y . Wang, L. Qian, and J. Zhao, “Private transformer inference in mlaas: A survey,”arXiv preprint arXiv:2505.10315, 2025

work page arXiv 2025
[48]

Secureml: A system for scalable privacy- preserving machine learning,

P. Mohassel and Y . Zhang, “Secureml: A system for scalable privacy- preserving machine learning,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 19–38

2017
[49]

MiniONN: Enabling securein- ference with minimal latency,

J. Liu, M. Juuti, Y . Lu, and N. Asokan, “MiniONN: Enabling securein- ference with minimal latency,” inProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017, pp. 119–133

2017
[50]

ABY — a framework for efficient mixed-protocol secure two-party computation,

D. Demmler, T. Schneider, and M. Zohner, “ABY — a framework for efficient mixed-protocol secure two-party computation,” inNetwork and Distributed System Security Symposium (NDSS). Internet Society,
[51]

Available: https://www.ndss-symposium.org/ndss2015/

[Online]. Available: https://www.ndss-symposium.org/ndss2015/
[52]

CrypTFlow: Secure TensorFlow inference,

N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “CrypTFlow: Secure TensorFlow inference,” in2020 IEEE Symposium on Security and Privacy (SP), 2020, pp. 336–353

2020
[53]

FALCON: Honest-majority maliciously secure framework for private deep learning,

S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, and T. Rabin, “FALCON: Honest-majority maliciously secure framework for private deep learning,”Proceedings on Privacy Enhancing Technologies (PETS), vol. 2021, no. 2, pp. 188–208, 2021

2021
[54]

XONN: XNOR-based oblivious deep neural network inference,

M. S. Riazi, M. Samragh, H. Chen, K. Laine, K. Lauter, and F. Koushanfar, “XONN: XNOR-based oblivious deep neural network inference,” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 1501–1518. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity19/presentation/riazi

2019
[55]

MPCFormer: Fast, performant and private transformer inference with MPC,

D. Li, H. Wang, R. Shao, H. Guo, E. Xing, and H. Zhang, “MPCFormer: Fast, performant and private transformer inference with MPC,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=CWmvjOEhgH-

2023
[56]

CipherGPT: Secure two-party GPT inference,

X. Hou, J. Liu, J. Li, Y . Li, W.-j. Lu, C. Hong, and K. Ren, “CipherGPT: Secure two-party GPT inference,”IACR Cryptol. ePrint Arch., p. 1147, 2023

2023
[57]

PUMA: Secure inference of LLaMA-7B in five minutes,

Y . Dong, W.-j. Lu, Y . Zheng, H. Wu, D. Zhao, J. Tan, Z. Huang, C. Hong, T. Wei, W.-G. Chen, and J. Zhou, “PUMA: Secure inference of LLaMA-7B in five minutes,”Security and Safety, vol. 4, p. 2025014, 2025, published online 23 Oct 2025. [Online]. Available: https://doi.org/10.1051/sands/2025014

work page doi:10.1051/sands/2025014 2025
[58]

SecureGPT: A frame- work for multi-party privacy-preserving transformer inference in GPT,

C. Zeng, D. He, Q. Feng, X. Yang, and Q. Luo, “SecureGPT: A frame- work for multi-party privacy-preserving transformer inference in GPT,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 9480–9493, 2024

2024
[59]

SIGMA: Secure GPT inference with function secret sharing,

K. Gupta, N. Jawalkar, A. Mukherjee, N. Chandran, D. Gupta, A. Panwar, and R. Sharma, “SIGMA: Secure GPT inference with function secret sharing,”Proceedings on Privacy Enhancing Technologies, vol. 2024, no. 4, pp. 61–79, 2024, artifact available. [Online]. Available: https://doi.org/10.56553/popets-2024-0107

work page doi:10.56553/popets-2024-0107 2024
[60]

SecBERT: Privacy-preserving pre-training based neural network inference system,

H. Huang and Y . Wang, “SecBERT: Privacy-preserving pre-training based neural network inference system,”Neural Netw., vol. 172, no. C, Apr. 2024. [Online]. Available: https://doi.org/10.1016/j.neunet.2024. 106135

work page doi:10.1016/j.neunet.2024 2024
[61]

Delphi: A cryptographic inference service for neural networks,

P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa, “Delphi: A cryptographic inference service for neural networks,” in 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, Aug. 2020, pp. 2505–2522. [Online]. Available: https: //www.usenix.org/conference/usenixsecurity20/presentation/mishra

2020
[62]

SiRnn: A math library for secure RNN inference,

D. Rathee, M. Rathee, R. K. Kiran Goli, D. Gupta, R. Sharma, N. Chandran, and A. Rastogi, “SiRnn: A math library for secure RNN inference,” in2021 IEEE Symposium on Security and Privacy (SP), 2021, pp. 1003–1020. MANUSCRIPT SUBMITTED TO IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 16 APPENDIXA ADDITIONALDETAILS ONFHE KERNELS A. Realization of Ciph...

2021
[63]

Let ¯Wdenote the pre-permuted weight matrix used by this projection kernel

Plaintext Weights:Under complexification, we useU= ⌈G/2⌉inputs⟨ ˜x(u)⟩=⟨x (2u) +ix (2u+1)⟩and we encode complex weights. Let ¯Wdenote the pre-permuted weight matrix used by this projection kernel. Its columns are ordered to match the target packing of the output. The row index addresses input channels and the column index addresses output channels. For ou...
[64]

The total key switching counts are #rot=O(B outN2),#conj=O(B out)

Key-Switching Complexity:Assuming a precomputed bank ofΦ ∆ C with cost#rot bank =O(U C)whereU=⌈G/2⌉ as in Section 3.2, eachΦ ∆ C is implemented byRotFirst Cm and costs two rotations. The total key switching counts are #rot=O(B outN2),#conj=O(B out)
[65]

We construct com- plex plaintext weights ˜Wby combining the two real weight matrices after applying the score-friendly column permutation

Special FusedQKProjection:When two projections share the same ciphertext input, most notablyQ=AW Q andK=AW K, we fuse them into one plaintext–ciphertext projection under CKKS complex packing. We construct com- plex plaintext weights ˜Wby combining the two real weight matrices after applying the score-friendly column permutation. Formally, we formW πS Q an...
[66]

Concretely, for each output blockb, segment c∈ {0,

How the SCP Rules Are Satisfied:ForV=AW V , the projection emitsVin the head-major packing expected by the value kernel. Concretely, for each output blockb, segment c∈ {0, . . . , C−1}of⟨y (b)⟩storesV :, bC+c, so the value kernel consumes theVciphertext list directly without any ciphertext- side repacking, satisfying Rule 1. The value kernel consumes the ...
[67]

, m/2−1}, the score kernel outputs a folded-diagonal ciphertext⟨S t⟩

Score Kernel Packing:For eacht∈ {0, . . . , m/2−1}, the score kernel outputs a folded-diagonal ciphertext⟨S t⟩. Only the firstHmslots are informative after head reduction, phase alignment, and undoing the baby shift. Letcut(⟨x⟩) denote the firstHmslots of⟨x⟩and define the exported stream s= cut(⟨S 0⟩)∥ · · · ∥cut(⟨Sm/2−1⟩). We packssequentially into the m...
[68]

LetH blk be the number of heads packed per value ciphertext and letB V =⌈H/H blk⌉

Value Kernel Packing:MPC returns the post- softmax weights in the same minimal folded-diagonal export order as above, namely as the streamp= cut(⟨P0⟩)∥ · · · ∥cut(⟨Pm/2−1⟩)packed intoK min(P) =l Hm 2 2n m ciphertexts. LetH blk be the number of heads packed per value ciphertext and letB V =⌈H/H blk⌉. Under the standard settingd h =m/2 = 64used in our imple...
[69]

WhenC̸≡0 (modH), different blocks start at different head offsets

Score Kernel Block Phase Correction:Blockℓhas head phaser ℓ = (ℓC) modH. WhenC̸≡0 (modH), different blocks start at different head offsets. We sum score contributions by phase and align them inside the firstHm slots. Forr∈ {0, . . . , H−1}define Alignr(⟨x⟩) =RotFirst Hm(⟨x⟩,(H−r)m). We applyAlign r before combining phases to form⟨S t⟩. MANUSCRIPT SUBMITTE...
[70]

For the score kernel routing step, we use disjoint active-segment masks{m c}C−1 c=0

Mask Construction:Lete s be the binary plaintext mask that is one on themslots of segmentsand zero elsewhere. For the score kernel routing step, we use disjoint active-segment masks{m c}C−1 c=0 . When the firstCsegments are active, we setm c =e c. In the head-major layout, each ciphertext block packsH blk heads contiguously. Segmentk( ˜h, u) = ˜h dh+ustor...
[71]

Both kernels use precomputedΦandΨbanks

Key Switching Complexity:LetB=⌈d/C⌉andB V = ⌈H/Hblk⌉, whereCis the channel capacity per ciphertext and Hblk is the number of heads packed per ciphertext block. Both kernels use precomputedΦandΨbanks. For the score kernel, each of them/2diagonals multiplies Bchannel blocks. Rotations are dominated byΨ-bank con- struction and the per-diagonal postprocessing...
[72]

Expand” for all BERT-large profiles and for W AN2/W AN3 on BERT-base and GPT2-base, and “Minimal

Why the KS Counts Differ from Powerformer:Power- former [12] accelerates attention by applying complex pack- ing to a generic blockwise CCMM under modified column packing. Its blockwise transforms˜σ,˜τ , ˜ϕ, ˜ψand auxiliary routines such as BLOCKTRANS, BLOCKTAU, BLOCKSIGMA, EXTRACT, and SPLITPASTEremain part of the algorithmic skeleton; complex arithmetic...