arxiv: 2605.08243 · v1 · submitted 2026-05-07 · 💻 cs.PL · cs.DC· cs.LG

Recognition: no theorem link

GPU-Accelerated Synthesis of Mixed-Boolean Arithmetic: Beyond Caching

Gabriel Bathie , Baptiste Mouillon , Nathana\"el Fijalkow

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3

classification 💻 cs.PL cs.DCcs.LG

keywords mixed boolean arithmeticprogram synthesisGPU accelerationdeobfuscationbottom-up enumerationcache-free synthesisreverse engineeringcompiler optimization

0 comments

The pith

A cache-free GPU enumeration method synthesizes mixed-boolean arithmetic expressions faster and at scales where prior tools fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a GPU-accelerated synthesizer for mixed-boolean arithmetic expressions that relies on bottom-up enumeration without any caching of equivalent candidates. This design keeps computation local and parallel on the GPU, avoiding the breakdown that occurs when prior cache-based methods encounter quantitative bitvector outputs instead of boolean behaviors. Experiments demonstrate clear speedups over existing CPU tools along with the ability to process larger specifications and produce bigger expressions that other approaches cannot reach. The work targets practical needs in deobfuscation, compiler optimization, reverse engineering, and cryptanalysis where scalable synthesis from input-output examples is required.

Core claim

SIMBA performs synthesis of mixed-boolean arithmetic expressions from input-output examples using a cache-free bottom-up enumeration strategy that is explicitly designed for GPU execution, keeping all work local and highly parallel without storing observationally equivalent candidates.

What carries the argument

Cache-free bottom-up enumeration on GPU, which enumerates candidate expressions directly without language caches or equivalence checks that fail for quantitative bitvector outputs.

If this is right

Substantially faster synthesis than prior CPU-based or cache-dependent MBA tools on the same specifications.
Successful handling of larger input-output specifications that overwhelm existing methods.
Ability to reach expression sizes that cause prior synthesizers to time out or fail entirely.
Cache-free GPU synthesis becomes a practical alternative for other quantitative domains beyond pure boolean settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-parallel enumeration pattern could transfer to synthesis tasks involving other quantitative outputs such as floating-point or cryptographic values.
Deobfuscation pipelines might incorporate this approach to tackle more complex real-world obfuscators that produce larger MBA expressions.
Hybrid systems could combine the GPU enumerator with lightweight CPU post-processing to further reduce overall runtime on mixed workloads.
The results suggest that hardware-specific redesigns of enumeration, rather than caching, deserve consideration in other program synthesis settings.

Load-bearing premise

The chosen benchmarks and expression sizes are representative of real-world deobfuscation and optimization tasks, and GPU enumeration stays efficient without hidden bottlenecks when scaling beyond the tested sizes.

What would settle it

Apply SIMBA to a new set of benchmarks containing twice as many input-output pairs or targeting expressions one size larger than the largest solved in the paper, then measure whether runtime and success rate maintain the reported advantages or degrade sharply.

Figures

Figures reproduced from arXiv: 2605.08243 by Baptiste Mouillon, Gabriel Bathie, Nathana\"el Fijalkow.

**Figure 2.** Figure 2: Percentage of instances solved by each algorithm across varying formula sizes. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of the solving time of SIMBA-RTID and SIMBA on each instance. Each circle represents an instance; its x-coordinate is the solving time for SIMBA-RTID and its y-coordinate the time for SIMBA. The lines x = 65 and y = 65 correspond to timeouts of SIMBA-RTID and SIMBA, respectively. Most instances solved by both programs lie on the y = x/4 line, which means that SIMBA is 4 times faster than SIMBA-R… view at source ↗

**Figure 4.** Figure 4: Percentage of instances solved by each algorithm across varying numbers of variables. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Number of instances solved under each time threshold (H100). [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Number of instances solved under each time threshold (A100). [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Number of instances solved by SIMBA under each time threshold on H100 GPU, averaged over 5 independent runs. The shaded area shows ± standard deviation. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Synthesizing Mixed-Boolean Arithmetic (MBA) expressions from input-output examples is central to program deobfuscation and also useful for compiler optimization, reverse engineering, and cryptanalysis. Existing MBA synthesizers are typically CPU-based and scale poorly on large specifications or complex targets. Recent GPU-accelerated synthesis methods achieve large speedups in qualitative settings, but they depend on caching observationally equivalent candidates; this strategy breaks down for MBA because candidate outputs are quantitative bitvectors and the behavioral space is enormous. We present SIMBA (Synthesis of Mixed-Boolean Arithmetic), a GPU-accelerated MBA synthesizer built around cache-free bottom-up enumeration. SIMBA avoids language caches entirely and uses a GPU-oriented enumeration design that keeps work local and highly parallel. In experiments, SIMBA is substantially faster than prior MBA synthesis tools, handles larger specifications, and reaches expression sizes that existing methods fail to solve. These results establish cache-free GPU synthesis as a practical and scalable approach for quantitative domains, and identify it as a strong alternative to cache-centric designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SIMBA shows that dropping caches for a GPU-tailored bottom-up enumerator lets it handle larger MBA specs and bigger expressions than prior cached tools.

read the letter

The main takeaway is that this paper shows a cache-free approach to GPU-accelerated synthesis works well for mixed-boolean arithmetic, delivering better performance and reach than cached baselines on larger instances. What stands out as new is the bottom-up enumeration designed specifically for GPU without relying on any form of caching for equivalent expressions. Earlier GPU synthesis methods used caches, but those don't transfer here because MBA produces quantitative bitvector outputs and the space is vast. The design keeps computations local and parallel on the device. The experiments back this with comparisons that show faster runtimes and the ability to handle specifications and expression sizes where previous tools fail. The paper does a solid job laying out the motivation from the domain and running the direct comparisons that support the performance claims. Soft spots are limited. The benchmarks could use more explanation to show they represent typical deobfuscation workloads, and adding details on run-to-run variance would help. Scaling behavior past the tested sizes isn't explored much, but nothing suggests a fundamental problem. This is relevant for people working on program synthesis tools or deobfuscation techniques. Readers focused on practical algorithmic improvements in synthesis will get concrete value from the results. It deserves a serious referee because the core contribution is a fresh algorithmic direction with empirical support, even if some aspects of the evaluation could be expanded.

Referee Report

1 major / 1 minor

Summary. The manuscript presents SIMBA, a GPU-accelerated synthesizer for Mixed-Boolean Arithmetic (MBA) expressions from input-output examples. It proposes a cache-free bottom-up enumeration strategy optimized for GPU parallelism, arguing that caching fails for MBA because candidate outputs are quantitative bitvectors with an enormous behavioral space. The central empirical claim is that SIMBA is substantially faster than prior MBA tools, handles larger specifications, and reaches expression sizes that existing methods cannot solve.

Significance. If the results hold, the work is significant for establishing cache-free GPU enumeration as a practical approach for quantitative synthesis domains where cache-centric designs break down. It provides direct experimental comparisons supporting speedups and extended reach, with strengths in the GPU-oriented design that keeps work local and parallel, and the absence of reliance on fitted parameters or circular derivations. This could advance applications in program deobfuscation, compiler optimization, reverse engineering, and cryptanalysis.

major comments (1)

[Experiments] Experimental evaluation: The performance claims rest on direct comparisons showing speedups and scalability, but the manuscript provides insufficient detail on benchmark selection criteria and does not report run-to-run variance or statistical measures in the timing results. This limits assessment of whether the tested cases are representative of real-world tasks and whether the GPU enumeration scales without hidden bottlenecks.

minor comments (1)

[Abstract] The abstract states that SIMBA reaches 'expression sizes that existing methods fail to solve' but does not reference a specific table or example showing the failure threshold; adding this would improve clarity of the reach claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. The feedback on experimental details is constructive, and we address it directly below by committing to clarifications and additions in the revised manuscript.

read point-by-point responses

Referee: Experimental evaluation: The performance claims rest on direct comparisons showing speedups and scalability, but the manuscript provides insufficient detail on benchmark selection criteria and does not report run-to-run variance or statistical measures in the timing results. This limits assessment of whether the tested cases are representative of real-world tasks and whether the GPU enumeration scales without hidden bottlenecks.

Authors: We agree that more detail on benchmark selection and statistical reporting would improve clarity. In the revision we will add an explicit subsection describing the benchmark selection criteria, including how the input-output specifications were derived from real-world deobfuscation, compiler optimization, and cryptanalysis tasks to ensure representativeness. Although the core enumeration algorithm is deterministic, GPU kernel scheduling can introduce small timing variations; we will re-run all timing experiments five times per benchmark, report means with standard deviations in the tables, and add a short discussion of observed scalability limits and potential bottlenecks. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; claims rest on independent algorithmic design and empirical validation

full rationale

The paper introduces SIMBA as a new GPU-oriented, cache-free bottom-up enumeration algorithm for MBA synthesis, motivated by the breakdown of caching for quantitative bitvector outputs. No derivation chain, equations, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or renamed known patterns. The performance claims (faster runtimes, larger solvable specifications) are grounded in direct experimental comparisons rather than any predictive model or uniqueness theorem imported from prior self-work. The design choices are explicitly justified by domain characteristics without circular reduction, making the contribution self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard assumptions about GPU parallelism and enumeration correctness but introduces no new free parameters, axioms beyond domain-standard ones, or invented entities.

axioms (2)

domain assumption Bottom-up enumeration generates all expressions up to a given size without duplicates when using appropriate pruning.
Invoked in the description of the cache-free design; standard in synthesis literature but not re-proven here.
domain assumption GPU kernels can be organized so that work remains local and highly parallel without global synchronization overhead for this enumeration pattern.
Central to the GPU-oriented design claim.

pith-pipeline@v0.9.0 · 5488 in / 1266 out tokens · 36286 ms · 2026-05-12T00:45:35.697666+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax- guided synthesis. InFormal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1–8. IEEE,

work page 2013
[2]

Austin Appleby

URL https://ieeexplore.ieee.org/ document/6679385/. Austin Appleby. Murmurhash3 and smhasher. https://github.com/aappleby/smhasher,

work page arXiv
[3]

Vidal Attias, Nicolas Bellec, Grégoire Menguy, Sébastien Bardin, and Jean-Yves Marion

Accessed: 2026-04-24. Vidal Attias, Nicolas Bellec, Grégoire Menguy, Sébastien Bardin, and Jean-Yves Marion. Augmenting search-based program synthesis with local inference rules to improve black-box deobfuscation. In Chun-Ying Huang, Jyh-Cheng Chen, Shiuh-Pyng Shieh, David Lie, and Véronique Cortier, editors,Proceedings of the 2025 ACM SIGSAC Conference o...

work page 2026
[4]

In: Huang, C., Chen, J., Shieh, S., Lie, D., Cortier, V

doi: 10.1145/3719027.3765134. URLhttps://doi.org/10.1145/3719027.3765134. Sorav Bansal and Alex Aiken. Automatic generation of peephole superoptimizers. In John Paul Shen and Margaret Martonosi, editors,Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA,...

work page doi:10.1145/3719027.3765134 2006
[5]

URLhttps://doi.org/10.1145/1168857.1168906

doi: 10.1145/1168857.1168906. URLhttps://doi.org/10.1145/1168857.1168906. Haniel Barbosa, Clark W. Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-...

work page doi:10.1145/1168857.1168906 2022
[6]

URL https://doi.org/10.1007/978-3-030-99524-9_24

doi: 10.1007/978-3-030-99524-9\_24. URL https://doi.org/10.1007/978-3-030-99524-9_24. Martin Berger, Nathanaël Fijalkow, and Mojtaba Valizadeh. GPU accelerated program synthesis: Enumerate semantics, not syntax!CoRR, abs/2504.18943,

work page doi:10.1007/978-3-030-99524-9
[7]

doi: 10.48550/ARXIV .2504. 18943. URLhttps://doi.org/10.48550/arXiv.2504.18943. Sandrine Blazy and Rémi Hutin. Formal verification of a program obfuscation based on mixed boolean-arithmetic expressions. In Assia Mahboubi and Magnus O. Myreen, editors,Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Ca...

work page internal anchor Pith review doi:10.48550/arxiv 2019
[8]

doi: 10.1145/3293880. 3294103. URLhttps://doi.org/10.1145/3293880.3294103. Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. Syntia: Synthesiz- ing the semantics of obfuscated code. In Engin Kirda and Thomas Ristenpart, editors,26th USENIX Security Symposium, USENIX Security 2017, Vancouver, BC, Canada, August 16-18, 2017, pages 643–65...

work page doi:10.1145/3293880 2017
[9]

NeuReduce: Reducing mixed boolean-arithmetic expressions by recurrent neural network

Weijie Feng, Binbin Liu, Dongpeng Xu, Qilong Zheng, and Yun Xu. NeuReduce: Reducing mixed boolean-arithmetic expressions by recurrent neural network. In Trevor Cohn, Yulan He, and Yang Liu, editors,Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Findings of ACL, pages 635–644. Association for Compu...

work page 2020
[10]

URL https: //doi.org/10.18653/v1/2020.findings-emnlp.56

doi: 10.18653/V1/2020.FINDINGS-EMNLP.56. URL https: //doi.org/10.18653/v1/2020.findings-emnlp.56. 10 Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh.Program Synthesis. Now Publishers,

work page doi:10.18653/v1/2020.findings-emnlp.56 2020
[11]

Simplifying mixed boolean-arithmetic obfuscation by program synthesis and term rewriting

Jaehyung Lee and Woosuk Lee. Simplifying mixed boolean-arithmetic obfuscation by program synthesis and term rewriting. In Weizhi Meng, Christian Damsgaard Jensen, Cas Cremers, and Engin Kirda, editors,Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, pages 2351–2365. ACM,

work page 2023
[12]

URL https://doi.org/10.1145/ 3576915.3623186

doi: 10.1145/3576915.3623186. URL https://doi.org/10.1145/ 3576915.3623186. Binbin Liu, Wenying Feng, Qilong Zheng, Jing Li, and Dongpeng Xu. Software obfuscation with non-linear mixed Boolean-arithmetic expressions. InInternational Conference on Information Security and Cryptology (ICISC), pages 276–296. Springer, 2021a. Binbin Liu, Junfu Shen, Jiang Min...

work page doi:10.1145/3576915.3623186 2021
[13]

URL https://doi.org/10.1145/ 3460120.3485250

doi: 10.1145/3460120.3485250. URL https://doi.org/10.1145/ 3460120.3485250. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scalable parallel programming with CUDA.ACM Queue, 6(2):40–53,

work page doi:10.1145/3460120.3485250
[14]

URL https: //doi.org/10.1145/1365490.1365500

doi: 10.1145/1365490.1365500. URL https: //doi.org/10.1145/1365490.1365500. Benjamin Reichenwallner and Peter Meerwald-Stadler. Efficient deobfuscation of linear mixed boolean-arithmetic expressions. In Golden G. Richard III and Tim Blazytko, editors,Proceedings of the 2022 ACM Workshop on Research on offensive and defensive techniques in the context of M...

work page doi:10.1145/1365490.1365500 2022
[15]

URLhttps://doi.org/10.1145/3560831.3564256

doi: 10.1145/3560831.3564256. URLhttps://doi.org/10.1145/3560831.3564256. Benjamin Reichenwallner and Peter Meerwald-Stadler. Simplification of general mixed boolean- arithmetic expressions: GAMBA. InIEEE European Symposium on Security and Privacy, Eu- roS&P 2023 - Workshops, Delft, Netherlands, July 3-7, 2023, pages 427–438. IEEE,

work page doi:10.1145/3560831.3564256 2023
[16]

Get a higher return on your savings!

doi: 10.1109/EUROSPW59978.2023.00053. URL https://doi.org/10.1109/EuroSPW59978. 2023.00053. Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Gratian Lup, Jubi Taneja, and John Regehr. Souper: A synthesizing superoptimizer.arXiv preprint arXiv:1711.04422,

work page doi:10.1109/eurospw59978.2023.00053 2023
[17]

Deshmukh, Sela Mador-Haim, Milo M

Abhishek Udupa, Arun Raghavan, Jyotirmoy V . Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. TRANSIT: specifying protocols with concolic snippets. In Hans-Juergen Boehm and Cormac Flanagan, editors,ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013, pages 287–296. ACM,

work page 2013
[18]

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines,

doi: 10.1145/2491956.2462174. URLhttps://doi.org/10.1145/2491956.2462174. Mojtaba Valizadeh and Martin Berger. Search-based regular expression inference on a GPU.Proc. ACM Program. Lang., 7(PLDI):1317–1339,

work page doi:10.1145/2491956.2462174
[19]

URL https://doi

doi: 10.1145/3591274. URL https://doi. org/10.1145/3591274. Mojtaba Valizadeh, Nathanaël Fijalkow, and Martin Berger. LTL learning on GPUs. In Arie Gurfinkel and Vijay Ganesh, editors,Computer Aided Verification - 36th International Conference, CAV 2024, Montreal, QC, Canada, July 24-27, 2024, Proceedings, Part III, Lecture Notes in Computer Science, page...

work page doi:10.1145/3591274 2024
[20]

URL https://doi.org/10.1007/978-3-031-65633-0_10

doi: 10.1007/978-3-031-65633-0\_10. URL https://doi.org/10.1007/978-3-031-65633-0_10. Hui Wei, Ting Liu, Shuai Wang, and Chengyu Hu. Unifying mixed Boolean-arithmetic obfuscation by architectural and anti-generalization hardening.Computers & Security,

work page doi:10.1007/978-3-031-65633-0
[21]

Information hiding in software with mixed boolean-arithmetic transforms

11 Yongxin Zhou, Alec Main, Yuan Xiang Gu, and Harold Johnson. Information hiding in software with mixed boolean-arithmetic transforms. In Sehun Kim, Moti Yung, and Hyung-Woo Lee, editors,Information Security Applications, 8th International Workshop, WISA 2007, Jeju Island, Korea, August 27-29, 2007, Revised Selected Papers, Lecture Notes in Computer Scie...

work page 2007
[22]

URL https://doi.org/10.1007/ 978-3-540-77535-5_5

doi: 10.1007/978-3-540-77535-5\_5. URL https://doi.org/10.1007/ 978-3-540-77535-5_5. A Comparison with Existing Tools 1 2 3 4 5 6 7 8 9 10 Number of variables 0 20 40 60 80 100Instances solved (%) Algorithm SIMBA VFB cvc5 Figure 4: Percentage of instances solved by each algorithm across varying numbers of variables. B Cache Size and Efficiency Tables The ...

work page doi:10.1007/978-3-540-77535-5