Recognition: unknown
EncFormer: Secure and Efficient Transformer Inference over Encrypted Data
Pith reviewed 2026-05-10 16:50 UTC · model grok-4.3
The pith
EncFormer reduces communication and latency for private transformer inference by aligning FHE kernels with stage compatible patterns and efficient MPC conversions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EncFormer is a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. It also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities, achieving 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems on GPT- and BERT-style models while maintaining near-plaintext accuracy on selected
What carries the argument
Stage Compatible Patterns, which align FHE operations across transformer stages to minimize repacking and costly switches to MPC.
Load-bearing premise
The Stage Compatible Patterns and secure complex CKKS-MPC conversion protocol preserve both security and numerical correctness when composed across the full transformer pipeline without introducing new side-channel or approximation vulnerabilities.
What would settle it
Running the full EncFormer pipeline on a GLUE task and measuring either a clear accuracy drop below near-plaintext levels or a detectable security leak in the composed FHE-MPC steps would show the claims do not hold.
Figures
read the original abstract
Transformer inference in machine-learning-as-a-service (MLaaS) raises privacy concerns for sensitive user inputs. Prior secure solutions that combine fully homomorphic encryption (FHE) and secure multiparty computation (MPC) are bottlenecked by inefficient FHE kernels, communication-heavy MPC protocols, and expensive FHE-MPC conversions. We present EncFormer, a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. EncFormer also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities. With GPU optimizations, evaluations on GPT- and BERT-style models show that EncFormer achieves 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems, and 1.9x-3.5x lower end-to-end latency on BERT-base than FHE-only pipelines under a matched backend, while maintaining near-plaintext accuracy on selected GLUE tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EncFormer, a two-party secure inference framework for Transformers that combines fully homomorphic encryption (FHE) with secure multiparty computation (MPC). It proposes Stage Compatible Patterns to minimize FHE kernel repacking and conversions, a cost analysis model based on a minimal-conversion baseline for choosing FHE-MPC boundaries, a secure complex CKKS-MPC conversion protocol, and optimized MPC protocols for nonlinear operations. GPU-accelerated evaluations on GPT- and BERT-style models report 1.4x–30.4x reductions in online MPC communication and 1.3x–9.8x lower end-to-end latency versus prior hybrid FHE-MPC systems, plus 1.9x–3.5x latency improvement over FHE-only pipelines on BERT-base, with near-plaintext accuracy on selected GLUE tasks.
Significance. If the reported performance gains and accuracy preservation hold under rigorous verification, EncFormer would mark a meaningful step toward practical private inference for large language models. The Stage Compatible Patterns and minimal-conversion cost model offer reusable design principles that could reduce the overhead of hybrid secure computation pipelines in deep learning. The work addresses real bottlenecks in FHE-MPC conversions and nonlinearities, potentially enabling more efficient MLaaS deployments while preserving privacy.
major comments (3)
- [§4.3] §4.3 (secure complex CKKS-MPC conversion protocol): the security and correctness arguments do not supply per-stage or end-to-end approximation-error bounds for the composition of Stage Compatible Patterns across attention, FFN, and layer-norm stages; without these bounds it is impossible to confirm that cumulative CKKS noise remains below the threshold that would alter the reported GLUE accuracy.
- [Table 3 / §5.2] Table 3 / §5.2 (evaluation results): the latency and communication speedups (1.3x–9.8x and 1.4x–30.4x) are presented as point estimates without error bars, run counts, or a per-component breakdown that isolates the contribution of the new patterns versus GPU optimizations and the minimal-conversion baseline.
- [§5.1] §5.1 (cost analysis model): the claim that the minimal-conversion baseline enables principled FHE-MPC boundary selection is not accompanied by a sensitivity analysis showing how small changes in the assumed noise growth or repacking cost alter the recommended boundaries for GPT- and BERT-scale models.
minor comments (2)
- Notation for the Stage Compatible Patterns is introduced without a compact summary table relating pattern names to the FHE kernels they enable; a single table would improve readability.
- The abstract states “near-plaintext accuracy on selected GLUE tasks” but the main text does not list the exact tasks or the numerical accuracy deltas; adding this information would strengthen the accuracy claim.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment point by point below, indicating the revisions we will make to improve the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [§4.3] §4.3 (secure complex CKKS-MPC conversion protocol): the security and correctness arguments do not supply per-stage or end-to-end approximation-error bounds for the composition of Stage Compatible Patterns across attention, FFN, and layer-norm stages; without these bounds it is impossible to confirm that cumulative CKKS noise remains below the threshold that would alter the reported GLUE accuracy.
Authors: We acknowledge that explicit per-stage and end-to-end approximation-error bounds would strengthen the security and correctness arguments. The manuscript currently relies on empirical evidence of near-plaintext GLUE accuracy to indicate that cumulative noise does not affect results. In the revised version, we will add analytical noise-growth bounds for each Stage Compatible Pattern and their composition across attention, FFN, and layer-norm stages to §4.3, along with a supporting appendix deriving the end-to-end bounds from the chosen CKKS parameters. This will explicitly confirm that noise remains below the accuracy-altering threshold. revision: yes
-
Referee: [Table 3 / §5.2] Table 3 / §5.2 (evaluation results): the latency and communication speedups (1.3x–9.8x and 1.4x–30.4x) are presented as point estimates without error bars, run counts, or a per-component breakdown that isolates the contribution of the new patterns versus GPU optimizations and the minimal-conversion baseline.
Authors: We agree that variability measures and component breakdowns would improve reproducibility and interpretability. The reported numbers reflect measurements from our primary GPU-based experimental runs. We will revise §5.2 and Table 3 to report averages over multiple runs (with standard deviations as error bars) and add a per-component breakdown (via an additional table or figure) that isolates the contributions of Stage Compatible Patterns, the secure CKKS-MPC conversion protocol, GPU optimizations, and the minimal-conversion baseline. revision: yes
-
Referee: [§5.1] §5.1 (cost analysis model): the claim that the minimal-conversion baseline enables principled FHE-MPC boundary selection is not accompanied by a sensitivity analysis showing how small changes in the assumed noise growth or repacking cost alter the recommended boundaries for GPT- and BERT-scale models.
Authors: The cost model in §5.1 uses the minimal-conversion baseline to guide boundary selection based on measured costs. To address the request for sensitivity analysis, we will revise §5.1 to include a brief sensitivity study examining how the recommended FHE-MPC boundaries for BERT-base and GPT-style models change under small variations (±20% in noise growth and ±30% in repacking cost). This will demonstrate the stability of the chosen boundaries within realistic parameter ranges. revision: yes
Circularity Check
No circularity: performance claims rest on empirical evaluation of new protocols
full rationale
The paper's central results derive from the introduction of Stage Compatible Patterns, a minimal-conversion cost model, a secure complex CKKS-MPC conversion protocol, and GPU-optimized MPC protocols for nonlinearities. These are then evaluated directly on GPT- and BERT-style models for communication volume, latency, and GLUE accuracy. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or definitional tautology; the reported speedups (1.4x-30.4x, 1.3x-9.8x, 1.9x-3.5x) and accuracy preservation are measured outcomes, not algebraic consequences of the input assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Security of the CKKS fully homomorphic encryption scheme under standard hardness assumptions
- domain assumption Honest-but-curious two-party threat model for the MPC protocols
Reference graph
Works this paper leans on
-
[1]
BERT: Pre- training of deep bidirectional transformers for language understanding,
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Co...
2019
-
[2]
Language models are unsupervised multitask learners,
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019, technical report, OpenAI. [Online]. Available: https://api.semanticscholar.org/ CorpusID:160025533
2019
-
[3]
arXiv preprint arXiv:1904.05342 , year =
K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling clinical notes and predicting hospital readmission,”ArXiv, vol. abs/1904.05342, 2019. [Online]. Available: https://api.semanticscholar. org/CorpusID:119308351
-
[4]
LEGAL-BERT: The muppets straight out of law school,
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, “LEGAL-BERT: The muppets straight out of law school,” inFindings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 2898–2904. [Online]. Available: https://aclant...
2020
-
[5]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining,
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “BioBERT: a pre-trained biomedical language representation model for biomedical text mining,”Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 09 2019. [Online]. Available: https://doi.org/10.1093/bioinformatics/ btz682
-
[6]
Finchain-BERT: A high-accuracy automatic fraud detection model based on NLP methods for financial scenarios,
X. Yang, C. Zhang, Y . Sun, K. Pang, L. Jing, S. Wa, and C. Lv, “Finchain-BERT: A high-accuracy automatic fraud detection model based on NLP methods for financial scenarios,”Information, vol. 14, no. 9, 2023. [Online]. Available: https://www.mdpi.com/2078-2489/14/ 9/499
2023
-
[7]
Membership inference attacks against machine learning models,
R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 3–18
2017
-
[8]
BOLT: privacy-preserving, accurate and efficient inference for transformers,
Q. Pang, J. Zhu, H. M ¨ollering, W. Zheng, and T. Schneider, “BOLT: privacy-preserving, accurate and efficient inference for transformers,” in IEEE Symposium on Security and Privacy (SP 2024). San Francisco, CA, USA: IEEE, May 2024, pp. 4753–4771
2024
-
[9]
Shaft: Secure, handy, accurate, and fast transformer inference,
A. Y . L. Kei and S. S. M. Chow, “Shaft: Secure, handy, accurate, and fast transformer inference,” inNetwork and Distributed System Security Symposium (NDSS), 2025
2025
-
[10]
Breaking the layer barrier: remodeling private transformer inference with hybrid CKKS and MPC,
T. Xu, W.-j. Lu, J. Yu, Y . Chen, C. Lin, R. Wang, and M. Li, “Breaking the layer barrier: remodeling private transformer inference with hybrid CKKS and MPC,” inProceedings of the 34th USENIX Conference on Security Symposium. USA: USENIX Association, 2025
2025
-
[11]
Thor: Secure transformer inference with homomorphic encryption,
J. Moon, D. Yoo, X. Jiang, and M. Kim, “Thor: Secure transformer inference with homomorphic encryption,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS ’25). Association for Computing Machinery, 2025, pp. 392–410
2025
-
[12]
Powerformer: Efficient and high-accuracy privacy-preserving language model with homomorphic encryption,
D. Park, E. Lee, and J.-W. Lee, “Powerformer: Efficient and high-accuracy privacy-preserving language model with homomorphic encryption,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vienna, Austria: Association for Computational Linguistics, Jul. 2025, pp. 11 090–11 111. [Online]. Avai...
2025
-
[13]
THE-X: Privacy-preserving transformer inference with homomorphic encryption,
T. Chen, H. Bao, S. Huang, L. Dong, B. Jiao, D. Jiang, H. Zhou, J. Li, and F. Wei, “THE-X: Privacy-preserving transformer inference with homomorphic encryption,” inFindings of the Association for Computational Linguistics: ACL 2022, S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 3...
2022
-
[14]
SecFormer: Fast and accurate privacy-preserving inference for transformer models via SMPC,
J. Luo, Y . Zhang, Z. Zhang, J. Zhang, X. Mu, H. Wang, Y . Yu, and Z. Xu, “SecFormer: Fast and accurate privacy-preserving inference for transformer models via SMPC,” inFindings of the Association for Computational Linguistics (ACL 2024), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, ...
2024
-
[15]
Iron: Private inference on transformers,
M. Hao, H. Li, H. Chen, P. Xing, G. Xu, and T. Zhang, “Iron: Private inference on transformers,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 15 718–15 731. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2022...
2022
-
[16]
BumbleBee: Secure two-party inference framework for large transformers,
W.-j. Lu, Z. Huang, Z. Gu, J. Li, J. Liu, C. Hong, K. Ren, T. Wei, and W. Chen, “BumbleBee: Secure two-party inference framework for large transformers,” inNetwork and Distributed System Security (NDSS) Symposium 2025. San Diego, CA, USA: Internet Society, Feb. 2025. [Online]. Available: https://doi.org/10.14722/ndss.2025.230057
-
[17]
Somewhat practical fully homomorphic encryption,
J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,”IACR Cryptology ePrint Archive, vol. 2012, p. 144, 2012. [Online]. Available: https://eprint.iacr.org/2012/144
2012
-
[18]
Leveled fully homo- morphic encryption without bootstrapping,
Z. Brakerski, C. Gentry, and V . Vaikuntanathan, “Leveled fully homo- morphic encryption without bootstrapping,” inProceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 2012, pp. 309–325
2012
-
[19]
Homomorphic encryption for arithmetic of approximate numbers,
J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inAdvances in Cryptology – ASIACRYPT 2017, ser. Lecture Notes in Computer Science, vol. 10624. Springer, 2017, pp. 409–437
2017
-
[20]
Phantom: A CUDA-accelerated word-wise homomorphic encryption library,
H. Yang, S. Shen, W. Dai, L. Zhou, Z. Liu, and Y . Zhao, “Phantom: A CUDA-accelerated word-wise homomorphic encryption library,”IEEE Trans. Dependable Secur. Comput., vol. 21, no. 5, pp. 4895–4906,
-
[21]
Available: https://doi.org/10.1109/TDSC.2024.3363900
[Online]. Available: https://doi.org/10.1109/TDSC.2024.3363900
-
[22]
Liberate.FHE: A New FHE Library for Bridging the Gap be- tween Theory and Practice with a Focus on Performance and Accuracy,
DESILO, “Liberate.FHE: A New FHE Library for Bridging the Gap be- tween Theory and Practice with a Focus on Performance and Accuracy,” https://github.com/Desilo/liberate-fhe, 2023, accessed 2025-09-17
2023
-
[23]
Microsoft SEAL (release 4.1),
“Microsoft SEAL (release 4.1),” https://github.com/Microsoft/SEAL, Jan. 2023, microsoft Research, Redmond, W A
2023
-
[24]
Primer: Fast private transformer infer- ence on encrypted data,
M. Zheng, Q. Lou, and L. Jiang, “Primer: Fast private transformer infer- ence on encrypted data,” in2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6
2023
-
[25]
Implementing gentry’s fully-homomorphic encryption scheme,
C. Gentry and S. Halevi, “Implementing gentry’s fully-homomorphic encryption scheme,” Cryptology ePrint Archive, Paper 2010/520, 2010. [Online]. Available: https://eprint.iacr.org/2010/520
2010
-
[26]
Algorithms in helib,
S. Halevi and V . Shoup, “Algorithms in helib,” inAdvances in Cryptol- ogy – CRYPTO 2014. Springer, 2014, pp. 554–571
2014
-
[27]
Efficient multiparty protocols using circuit randomization,
D. Beaver, “Efficient multiparty protocols using circuit randomization,” inAdvances in Cryptology — CRYPTO ’91, J. Feigenbaum, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 1992, pp. 420–432
1992
-
[28]
Multiparty computation from somewhat homomorphic encryption,
I. Damg ˚ard, V . Pastro, N. P. Smart, and S. Zakarias, “Multiparty computation from somewhat homomorphic encryption,” inAdvances in Cryptology – CRYPTO 2012, ser. Lecture Notes in Computer Science, vol. 7417. Springer, 2012, pp. 643–662
2012
-
[29]
How to generate and exchange secrets,
A. C.-C. Yao, “How to generate and exchange secrets,” in27th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 1986, pp. 162–167
1986
-
[30]
How to play any mental game,
O. Goldreich, S. Micali, and A. Wigderson, “How to play any mental game,” inProceedings of the 19th Annual ACM Symposium on Theory of Computing (STOC). ACM, 1987, pp. 218–229
1987
-
[31]
Secure transformer inference made non-interactive,
J. Zhang, X. Yang, L. He, K. Chen, W. jie Lu, Y . Wang, X. Hou, J. Liu, K. Ren, and X. Yang, “Secure transformer inference made non-interactive,” in32nd Annual Network and Distributed System Security Symposium (NDSS 2025). The Internet Society, 2025. [Online]. Available: https://www.ndss-symposium.org/ ndss-paper/secure-transformer-inference-made-non-interactive/
2025
-
[32]
Cheetah: Lean and fast secure Two-Party deep neural network inference,
Z. Huang, W. jie Lu, C. Hong, and J. Ding, “Cheetah: Lean and fast secure Two-Party deep neural network inference,” in31st USENIX Security Symposium (USENIX Security 22). Boston, MA: USENIX Association, Aug. 2022, pp. 809–826. [Online]. Available: https://www. usenix.org/conference/usenixsecurity22/presentation/huang-zhicong
2022
-
[33]
GAZELLE: A low latency framework for secure neural network inference,
C. Juvekar, V . Vaikuntanathan, and A. Chandrakasan, “GAZELLE: A low latency framework for secure neural network inference,” in27th USENIX Security Symposium (USENIX Security 18). Baltimore, MD: USENIX Association, Aug. 2018, pp. 1651–1669. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity18/presentation/juvekar MANUSCRIPT SUBMITTED T...
2018
-
[34]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://pr...
2017
-
[35]
Secure outsourced matrix computation and application to neural networks,
X. Jiang, M. Kim, K. Lauter, and Y . Song, “Secure outsourced matrix computation and application to neural networks,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’18. New York, NY , USA: Association for Computing Machinery, 2018, pp. 1209–1222. [Online]. Available: https://doi.org/10.1145/3243734.3243837
-
[36]
Privcirnet: efficient private inference via block circulant transformation,
T. Xu, L. Wu, R. Wang, and M. Li, “Privcirnet: efficient private inference via block circulant transformation,” inProceedings of the 38th International Conference on Neural Information Processing Systems, ser. NIPS ’24. Red Hook, NY , USA: Curran Associates Inc., 2024
2024
-
[37]
Mp2ml: A mixed-protocol machine learning framework for private inference,
F. Boemer, R. Cammarota, D. Demmler, T. Schneider, and H. Yalame, “Mp2ml: A mixed-protocol machine learning framework for private inference,” inProceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, ser. PPMLP’20. New York, NY , USA: Association for Computing Machinery, 2020, pp. 43–45. [Online]. Available: https://doi.org/10...
-
[38]
EzPC: Programmable and efficient secure two-party computation for machine learning,
N. Chandran, D. Gupta, A. Rastogi, R. Sharma, and S. Tripathi, “EzPC: Programmable and efficient secure two-party computation for machine learning,” inIEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2019, pp. 496–511
2019
-
[39]
CrypTFlow2: Practical 2-party secure inference,
D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “CrypTFlow2: Practical 2-party secure inference,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2020, pp. 325–342
2020
-
[40]
Crypten: secure multi-party computation meets machine learning,
B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, M. Ibrahim, and L. van der Maaten, “Crypten: secure multi-party computation meets machine learning,” inProceedings of the 35th International Conference on Neural Information Processing Systems, ser. NIPS ’21. Red Hook, NY , USA: Curran Associates Inc., 2021
2021
-
[41]
GLUE: A multi-task benchmark and analysis platform for natural language understanding,
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” inProceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, T. Linzen, G. Chrupała, and A. Alishahi, Eds. Brussels, Belgium: Association for Computational Lin...
2018
-
[42]
Transformers,
Hugging Face, “Transformers,” https://huggingface.co/, 2026, accessed: 2026-01-26
2026
-
[43]
FIDESlib: A fully-fledged open-source FHE library for efficient CKKS on GPUs,
C. Agull ´o-Domingo, ´O. Vera-L ´opez, S. Guzelhan, L. Daksha, A. E. Jerari, K. Shivdikar, R. Agrawal, D. Kaeli, A. Joshi, and J. L. Abell ´an, “FIDESlib: A fully-fledged open-source FHE library for efficient CKKS on GPUs,” in2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2025, pp. 1–3
2025
-
[44]
Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,
R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,” inProceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New Yo...
2016
-
[45]
Chet: an optimizing compiler for fully-homomorphic neural-network inferencing,
R. Dathathri, O. Saarikivi, H. Chen, K. Laine, K. Lauter, S. Maleki, M. Musuvathi, and T. Mytkowicz, “Chet: an optimizing compiler for fully-homomorphic neural-network inferencing,” inProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2019. New York, NY , USA: Association for Computing Machinery, 20...
-
[46]
Cipherformer: Efficient transformer private inference with low round complexity,
W. Wang and Y . Kuang, “Cipherformer: Efficient transformer private inference with low round complexity,” in2024 27th International Con- ference on Computer Supported Cooperative Work in Design (CSCWD), 2024, pp. 3054–3059
2024
-
[47]
Private transformer inference in mlaas: A survey,
Y . Li, X. Zhou, Y . Wang, L. Qian, and J. Zhao, “Private transformer inference in mlaas: A survey,”arXiv preprint arXiv:2505.10315, 2025
-
[48]
Secureml: A system for scalable privacy- preserving machine learning,
P. Mohassel and Y . Zhang, “Secureml: A system for scalable privacy- preserving machine learning,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 19–38
2017
-
[49]
MiniONN: Enabling securein- ference with minimal latency,
J. Liu, M. Juuti, Y . Lu, and N. Asokan, “MiniONN: Enabling securein- ference with minimal latency,” inProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017, pp. 119–133
2017
-
[50]
ABY — a framework for efficient mixed-protocol secure two-party computation,
D. Demmler, T. Schneider, and M. Zohner, “ABY — a framework for efficient mixed-protocol secure two-party computation,” inNetwork and Distributed System Security Symposium (NDSS). Internet Society,
-
[51]
Available: https://www.ndss-symposium.org/ndss2015/
[Online]. Available: https://www.ndss-symposium.org/ndss2015/
-
[52]
CrypTFlow: Secure TensorFlow inference,
N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “CrypTFlow: Secure TensorFlow inference,” in2020 IEEE Symposium on Security and Privacy (SP), 2020, pp. 336–353
2020
-
[53]
FALCON: Honest-majority maliciously secure framework for private deep learning,
S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, and T. Rabin, “FALCON: Honest-majority maliciously secure framework for private deep learning,”Proceedings on Privacy Enhancing Technologies (PETS), vol. 2021, no. 2, pp. 188–208, 2021
2021
-
[54]
XONN: XNOR-based oblivious deep neural network inference,
M. S. Riazi, M. Samragh, H. Chen, K. Laine, K. Lauter, and F. Koushanfar, “XONN: XNOR-based oblivious deep neural network inference,” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 1501–1518. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity19/presentation/riazi
2019
-
[55]
MPCFormer: Fast, performant and private transformer inference with MPC,
D. Li, H. Wang, R. Shao, H. Guo, E. Xing, and H. Zhang, “MPCFormer: Fast, performant and private transformer inference with MPC,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=CWmvjOEhgH-
2023
-
[56]
CipherGPT: Secure two-party GPT inference,
X. Hou, J. Liu, J. Li, Y . Li, W.-j. Lu, C. Hong, and K. Ren, “CipherGPT: Secure two-party GPT inference,”IACR Cryptol. ePrint Arch., p. 1147, 2023
2023
-
[57]
PUMA: Secure inference of LLaMA-7B in five minutes,
Y . Dong, W.-j. Lu, Y . Zheng, H. Wu, D. Zhao, J. Tan, Z. Huang, C. Hong, T. Wei, W.-G. Chen, and J. Zhou, “PUMA: Secure inference of LLaMA-7B in five minutes,”Security and Safety, vol. 4, p. 2025014, 2025, published online 23 Oct 2025. [Online]. Available: https://doi.org/10.1051/sands/2025014
-
[58]
SecureGPT: A frame- work for multi-party privacy-preserving transformer inference in GPT,
C. Zeng, D. He, Q. Feng, X. Yang, and Q. Luo, “SecureGPT: A frame- work for multi-party privacy-preserving transformer inference in GPT,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 9480–9493, 2024
2024
-
[59]
SIGMA: Secure GPT inference with function secret sharing,
K. Gupta, N. Jawalkar, A. Mukherjee, N. Chandran, D. Gupta, A. Panwar, and R. Sharma, “SIGMA: Secure GPT inference with function secret sharing,”Proceedings on Privacy Enhancing Technologies, vol. 2024, no. 4, pp. 61–79, 2024, artifact available. [Online]. Available: https://doi.org/10.56553/popets-2024-0107
-
[60]
SecBERT: Privacy-preserving pre-training based neural network inference system,
H. Huang and Y . Wang, “SecBERT: Privacy-preserving pre-training based neural network inference system,”Neural Netw., vol. 172, no. C, Apr. 2024. [Online]. Available: https://doi.org/10.1016/j.neunet.2024. 106135
-
[61]
Delphi: A cryptographic inference service for neural networks,
P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa, “Delphi: A cryptographic inference service for neural networks,” in 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, Aug. 2020, pp. 2505–2522. [Online]. Available: https: //www.usenix.org/conference/usenixsecurity20/presentation/mishra
2020
-
[62]
SiRnn: A math library for secure RNN inference,
D. Rathee, M. Rathee, R. K. Kiran Goli, D. Gupta, R. Sharma, N. Chandran, and A. Rastogi, “SiRnn: A math library for secure RNN inference,” in2021 IEEE Symposium on Security and Privacy (SP), 2021, pp. 1003–1020. MANUSCRIPT SUBMITTED TO IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 16 APPENDIXA ADDITIONALDETAILS ONFHE KERNELS A. Realization of Ciph...
2021
-
[63]
Let ¯Wdenote the pre-permuted weight matrix used by this projection kernel
Plaintext Weights:Under complexification, we useU= ⌈G/2⌉inputs⟨ ˜x(u)⟩=⟨x (2u) +ix (2u+1)⟩and we encode complex weights. Let ¯Wdenote the pre-permuted weight matrix used by this projection kernel. Its columns are ordered to match the target packing of the output. The row index addresses input channels and the column index addresses output channels. For ou...
-
[64]
The total key switching counts are #rot=O(B outN2),#conj=O(B out)
Key-Switching Complexity:Assuming a precomputed bank ofΦ ∆ C with cost#rot bank =O(U C)whereU=⌈G/2⌉ as in Section 3.2, eachΦ ∆ C is implemented byRotFirst Cm and costs two rotations. The total key switching counts are #rot=O(B outN2),#conj=O(B out)
-
[65]
We construct com- plex plaintext weights ˜Wby combining the two real weight matrices after applying the score-friendly column permutation
Special FusedQKProjection:When two projections share the same ciphertext input, most notablyQ=AW Q andK=AW K, we fuse them into one plaintext–ciphertext projection under CKKS complex packing. We construct com- plex plaintext weights ˜Wby combining the two real weight matrices after applying the score-friendly column permutation. Formally, we formW πS Q an...
-
[66]
Concretely, for each output blockb, segment c∈ {0,
How the SCP Rules Are Satisfied:ForV=AW V , the projection emitsVin the head-major packing expected by the value kernel. Concretely, for each output blockb, segment c∈ {0, . . . , C−1}of⟨y (b)⟩storesV :, bC+c, so the value kernel consumes theVciphertext list directly without any ciphertext- side repacking, satisfying Rule 1. The value kernel consumes the ...
-
[67]
, m/2−1}, the score kernel outputs a folded-diagonal ciphertext⟨S t⟩
Score Kernel Packing:For eacht∈ {0, . . . , m/2−1}, the score kernel outputs a folded-diagonal ciphertext⟨S t⟩. Only the firstHmslots are informative after head reduction, phase alignment, and undoing the baby shift. Letcut(⟨x⟩) denote the firstHmslots of⟨x⟩and define the exported stream s= cut(⟨S 0⟩)∥ · · · ∥cut(⟨Sm/2−1⟩). We packssequentially into the m...
-
[68]
LetH blk be the number of heads packed per value ciphertext and letB V =⌈H/H blk⌉
Value Kernel Packing:MPC returns the post- softmax weights in the same minimal folded-diagonal export order as above, namely as the streamp= cut(⟨P0⟩)∥ · · · ∥cut(⟨Pm/2−1⟩)packed intoK min(P) =l Hm 2 2n m ciphertexts. LetH blk be the number of heads packed per value ciphertext and letB V =⌈H/H blk⌉. Under the standard settingd h =m/2 = 64used in our imple...
-
[69]
WhenC̸≡0 (modH), different blocks start at different head offsets
Score Kernel Block Phase Correction:Blockℓhas head phaser ℓ = (ℓC) modH. WhenC̸≡0 (modH), different blocks start at different head offsets. We sum score contributions by phase and align them inside the firstHm slots. Forr∈ {0, . . . , H−1}define Alignr(⟨x⟩) =RotFirst Hm(⟨x⟩,(H−r)m). We applyAlign r before combining phases to form⟨S t⟩. MANUSCRIPT SUBMITTE...
-
[70]
For the score kernel routing step, we use disjoint active-segment masks{m c}C−1 c=0
Mask Construction:Lete s be the binary plaintext mask that is one on themslots of segmentsand zero elsewhere. For the score kernel routing step, we use disjoint active-segment masks{m c}C−1 c=0 . When the firstCsegments are active, we setm c =e c. In the head-major layout, each ciphertext block packsH blk heads contiguously. Segmentk( ˜h, u) = ˜h dh+ustor...
-
[71]
Both kernels use precomputedΦandΨbanks
Key Switching Complexity:LetB=⌈d/C⌉andB V = ⌈H/Hblk⌉, whereCis the channel capacity per ciphertext and Hblk is the number of heads packed per ciphertext block. Both kernels use precomputedΦandΨbanks. For the score kernel, each of them/2diagonals multiplies Bchannel blocks. Rotations are dominated byΨ-bank con- struction and the per-diagonal postprocessing...
-
[72]
Expand” for all BERT-large profiles and for W AN2/W AN3 on BERT-base and GPT2-base, and “Minimal
Why the KS Counts Differ from Powerformer:Power- former [12] accelerates attention by applying complex pack- ing to a generic blockwise CCMM under modified column packing. Its blockwise transforms˜σ,˜τ , ˜ϕ, ˜ψand auxiliary routines such as BLOCKTRANS, BLOCKTAU, BLOCKSIGMA, EXTRACT, and SPLITPASTEremain part of the algorithmic skeleton; complex arithmetic...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.