Adapting AlphaEvolve to Optimize Fully Homomorphic Encryption on TPUs
Pith reviewed 2026-06-30 20:41 UTC · model grok-4.3
The pith
An automated evolutionary search with language model code generation discovers FHE kernel optimizations that reduce TFHE bootstrap latency by 2.5 times on TPUs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within 24 hours of automated exploration, AlphaEvolve discovered implementation-level optimizations that improve TFHE bootstrap latency by 2.5x and CKKS rotation and multiplication latency by 1.31x and 1.18x, respectively, relative to human-engineered state of the art on Google Cloud TPUv5e.
What carries the argument
The closed-loop evolutionary search that uses LLM-driven code generation together with real hardware execution feedback and rigorous correctness testing to guide optimization of MXU and VPU usage.
If this is right
- The same search loop can co-optimize data movement across vector register files and systolic array utilization for other FHE primitives.
- Automated exploration reduces reliance on manual trial-and-error when mapping cryptographic kernels to accelerator hardware.
- The reported speedups apply to both TFHE (Jaxite) and CKKS (CROSS) schemes on contemporary TPU architectures.
- Researchers gain a practical way to explore trade-offs among cryptography parameters, compiler choices, and hardware constraints.
Where Pith is reading between the lines
- The approach may transfer to GPU or other accelerator targets for the same FHE schemes, provided equivalent hardware feedback loops can be built.
- If the method scales, it could shorten the iteration cycle for testing new FHE parameter sets or algorithmic variants on target hardware.
- Success on these kernels suggests the search technique could apply to related high-performance cryptography tasks such as zero-knowledge proof generation on accelerators.
Load-bearing premise
The evolutionary process with LLM code generation and hardware feedback reliably produces correct implementations that generalize beyond the specific kernels and inputs tested.
What would settle it
Execute the discovered optimized kernels on a fresh set of inputs or on a different TPU variant and check whether the reported latency improvements and correctness both hold.
Figures
read the original abstract
The deployment of Fully Homomorphic Encryption (FHE) at scale is hindered due to its heavy computational overhead. While specialized hardware accelerators like Google Tensor Processing Units (TPUs) can help, mapping complex cryptographic kernels onto such architectures remains a challenge. Efficient execution requires co-optimization between the systolic array-based Matrix Multiplication Unit (MXU) and Vector Processing Units (VPUs), as well as the orchestration of data movement across the vector register files. Existing compiler stacks often abstract low-level hardware utilization, requiring developers to adopt a manual trial-and-error process that often results in fragmented execution and underutilized resources. To accelerate this development process, we use AlphaEvolve to automate the exploration of hardware-aware cryptographic-kernel optimizations. We frame optimization as an evolutionary search problem, utilizing the closed-loop system provided by AlphaEvolve, that leverages LLM-driven code generation. We use real-world feedback from hardware execution and rigorous correctness testing to guide the evolution process. We evaluate AlphaEvolve optimization on primitives for both the TFHE (Jaxite) and CKKS (CROSS) FHE schemes on Google Cloud TPUv5e, a contemporary TPU architecture. Within 24 hours of automated exploration, AlphaEvolve discovered implementation-level optimizations that improve TFHE bootstrap latency by 2.5x and CKKS rotation and multiplication latency by 1.31x and 1.18x, respectively, relative to human-engineered state of the art. These results demonstrate that AlphaEvolve can be used to enable researchers to navigate the optimization trade-offs between cryptography, compilers, and hardware accelerators.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper adapts AlphaEvolve—an LLM-driven closed-loop evolutionary search system—to automatically optimize low-level implementations of TFHE (Jaxite) and CKKS (CROSS) cryptographic kernels for execution on Google Cloud TPUv5e. It frames the task as hardware-aware evolutionary search that incorporates real TPU execution feedback and correctness checks, and reports that 24 hours of search yields implementation-level optimizations delivering 2.5× lower TFHE bootstrap latency, 1.31× lower CKKS rotation latency, and 1.18× lower CKKS multiplication latency relative to human-engineered state-of-the-art baselines.
Significance. If the reported speedups are reproducible and the optimized kernels generalize, the work provides concrete evidence that automated evolutionary search with hardware-in-the-loop feedback can navigate the complex trade-offs among cryptography, compiler scheduling, and TPU systolic-array / VPU utilization. The specific numeric gains on production TPU hardware constitute a practical demonstration of the approach rather than a purely theoretical claim.
major comments (2)
- [Abstract] Abstract: the central performance claims (2.5× TFHE bootstrap, 1.31× CKKS rotation, 1.18× CKKS multiplication) are presented without any description of the exact baseline implementations, the measurement methodology (including number of runs or statistical significance), or the concrete correctness-verification procedures applied to the evolved kernels. These omissions directly affect the ability to assess whether the reported speedups are load-bearing for the paper’s thesis.
- [Introduction / Method] The manuscript does not indicate whether the AlphaEvolve search harness, LLM prompts, or evaluation infrastructure were developed independently of the present authors; given the overlap noted in the submission metadata, this creates a circularity risk around the claimed generality of the closed-loop system.
minor comments (1)
- [Abstract] The abstract refers to “rigorous correctness testing” but supplies no further detail on the test suite or failure modes; a short paragraph in §3 or §4 would clarify this without altering the central claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of our results. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims (2.5× TFHE bootstrap, 1.31× CKKS rotation, 1.18× CKKS multiplication) are presented without any description of the exact baseline implementations, the measurement methodology (including number of runs or statistical significance), or the concrete correctness-verification procedures applied to the evolved kernels. These omissions directly affect the ability to assess whether the reported speedups are load-bearing for the paper’s thesis.
Authors: We agree that the abstract would be strengthened by including these details. In the revised manuscript we will expand the abstract to briefly specify the human-engineered baselines (the reference Jaxite TFHE and CROSS CKKS implementations), the measurement protocol (latency averaged over 100 runs on TPUv5e with reported standard deviation), and the correctness procedures (functional equivalence tests plus cryptographic property checks against reference outputs). These elements are already described in Section 4; the revision will ensure they are summarized at the abstract level. revision: yes
-
Referee: [Introduction / Method] The manuscript does not indicate whether the AlphaEvolve search harness, LLM prompts, or evaluation infrastructure were developed independently of the present authors; given the overlap noted in the submission metadata, this creates a circularity risk around the claimed generality of the closed-loop system.
Authors: AlphaEvolve is presented as a reusable evolutionary framework whose core components (search harness, prompt templates, and hardware-in-the-loop evaluator) were designed to be domain-agnostic. The current work applies this framework to FHE kernels. To remove ambiguity we will insert explicit language in the introduction and methods sections stating that the base AlphaEvolve infrastructure predates this application and has been used in other optimization domains. This clarification directly addresses the generality claim without altering the technical content. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical application of an existing evolutionary search tool (AlphaEvolve) to discover hardware-specific optimizations for FHE kernels, with results validated directly against real TPU execution and independent correctness checks. No load-bearing step reduces by definition, self-citation, or fitted-input renaming to the claimed performance gains; the reported speedups (2.5x TFHE bootstrap, 1.31x/1.18x CKKS) are externally measurable outcomes of the closed-loop process rather than tautological restatements of inputs or prior self-citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AlphaEvolve supplies a reliable closed-loop evolutionary search using LLM code generation and hardware execution feedback for cryptographic kernels.
Reference graph
Works this paper leans on
-
[1]
On the concrete hardness of learning with errors,
M. R. Albrecht, R. Player, and S. Scott, “On the concrete hardness of learning with errors,” Cryptology ePrint Archive, Paper 2015/046, 2015. [Online]. Available: https://eprint.iacr.org/2015/046
2015
-
[2]
Jaxite: A fully homomorphic encryption backend targeting TPUs and GPUs, written in JAX,
A. Ali, E. Astor, B. Gipson, S. Gorantala, M. Guevara, J. Kun, W. Lam, R. Misoczki, R. Springer, J. Takeshita, J. Tong, C. Tew, and C. Yun, “Jaxite: A fully homomorphic encryption backend targeting TPUs and GPUs, written in JAX,” https://github.com/google/jaxite, 2024
2024
-
[3]
Heir: A universal compiler for homomorphic encryption,
A. Ali, J. Choi, B. Gipson, S. Gorantala, J. Kun, W. Legiest, L. Lim, A. Viand, M. Z. Demissie, and H. Zheng, “Heir: A universal compiler for homomorphic encryption,” 2025. [Online]. Available: https://arxiv.org/abs/2508.11095
-
[4]
Homomorphic encryption for arithmetic of approximate numbers,
J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inAdvances in Cryptology – ASIACRYPT 2017, ser. Lecture Notes in Computer Science, vol. 10624. Springer, 2017, pp. 409–437
2017
-
[5]
TFHE: Fast fully homomorphic encryption over the torus,
I. Chillotti, N. Gama, M. Georgieva, and M. Izabach `ene, “TFHE: Fast fully homomorphic encryption over the torus,”Journal of Cryptology, vol. 33, no. 1, pp. 34–91, Jan 2020
2020
-
[6]
Add an unroll factor of 8 (reduces bootstrap speedup on tpu by 2 ms),
code perspective, “Add an unroll factor of 8 (reduces bootstrap speedup on tpu by 2 ms),” https://github.com/google/jaxite/pull/88, 2026, gitHub Pull Request #88, Accessed: 2026-05-06
2026
-
[7]
Improvements to the vector_matrix_polymul provides a speedup of 1.7 ms,
code perspective, “Improvements to the vector_matrix_polymul provides a speedup of 1.7 ms,” https://github.com/google/jaxite/pull/89, 2026, gitHub Pull Request #89, Accessed: 2026-05-06
2026
-
[8]
Update i32 matmul unreduced cggi to use single bfloat16 (instead of 4) for int32 multiplications,
code perspective, “Update i32 matmul unreduced cggi to use single bfloat16 (instead of 4) for int32 multiplications,” https://github.com/ google/jaxite/pull/90, 2026, gitHub Pull Request #90, Accessed: 2026- 05-06
2026
-
[9]
Fully homomorphic encryption using ideal lattices,
C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proceedings of the F orty-First Annual ACM Symposium on Theory of Computing, ser. STOC ’09. New York, NY , USA: Association for Computing Machinery, 2009, p. 169–178. [Online]. Available: https://doi.org/10.1145/1536414.1536440
-
[10]
Unlocking the potential of fully homomorphic encryption,
S. Gorantala, R. Springer, and B. Gipson, “Unlocking the potential of fully homomorphic encryption,”Communications of the ACM, vol. 66, no. 6, pp. 68–75, jun 2023. [Online]. Available: https://doi.org/10.1145/3582500
-
[11]
A general purpose transpiler for fully homomorphic encryption,
S. Gorantala, R. Springer, S. Purser-Haskell, W. Lam, R. J. Wilson, A. Ali, E. P. Astor, I. Zukerman, S. Ruth, C. Dibak, P. Schoppmann, S. Kulankhina, A. Forget, D. Marn, C. Tew, R. Misoczki, B. Guillen, X. Ye, D. Kraft, D. Desfontaines, A. Krishnamurthy, M. Guevara, I. M. Perera, Y . Sushko, and B. Gipson, “A general purpose transpiler for fully homomorp...
-
[12]
Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement,
X. He, “Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement,”PLOS ONE, vol. 18, no. 2, p. e0282265, 2023
2023
-
[13]
Autocomp: A powerful and portable code optimizer for tensor accelerators,
C. Hong, S. Bhatia, A. Cheung, and Y . S. Shao, “Autocomp: A powerful and portable code optimizer for tensor accelerators,” 2025. [Online]. Available: https://arxiv.org/abs/2505.18574
-
[14]
Pallas quickstart — JAX documentation,
JAX Developers, “Pallas quickstart — JAX documentation,” https://docs. jax.dev/en/latest/pallas/quickstart.html, 2024
2024
-
[15]
Evox: Meta-evolution for automated discovery
S. Liu, S. Agarwal, M. Maheswaran, M. Cemri, Z. Li, Q. Mang, A. Naren, E. Boneh, A. Cheng, M. Z. Pan, A. Du, K. Keutzer, A. Cheung, A. G. Dimakis, K. Sen, M. Zaharia, and I. Stoica, “Evox: Meta-evolution for automated discovery,” 2026. [Online]. Available: https://arxiv.org/abs/2602.23413
-
[16]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
A. Novikov, N. V ˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabianet al., “Alphaevolve: A coding agent for scientific and algorithmic discovery,” arXiv preprint arXiv:2506.13131, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
OpenXLA: Accelerated linear algebra,
OpenXLA Project, “OpenXLA: Accelerated linear algebra,” https://github. com/openxla/xla, 2026
2026
-
[18]
Leveraging asic ai chips for homomorphic encryption,
J. Tong, T. Huang, J. Dang, L. de Castro, A. Itagi, A. Golder, A. Ali, J. Jiang, J. Kun, Arvind, G. E. Suh, and T. Krishna, “Leveraging asic ai chips for homomorphic encryption,” ser. HPCA’26. Australia: 2026 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2026. 6 APPENDIX A. LLM Instruction-Prompt Pool • Prompt 1:Suggest a ...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.