pith. machine review for the scientific record. sign in

arxiv: 2605.08243 · v1 · submitted 2026-05-07 · 💻 cs.PL · cs.DC· cs.LG

Recognition: no theorem link

GPU-Accelerated Synthesis of Mixed-Boolean Arithmetic: Beyond Caching

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3

classification 💻 cs.PL cs.DCcs.LG
keywords mixed boolean arithmeticprogram synthesisGPU accelerationdeobfuscationbottom-up enumerationcache-free synthesisreverse engineeringcompiler optimization
0
0 comments X

The pith

A cache-free GPU enumeration method synthesizes mixed-boolean arithmetic expressions faster and at scales where prior tools fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a GPU-accelerated synthesizer for mixed-boolean arithmetic expressions that relies on bottom-up enumeration without any caching of equivalent candidates. This design keeps computation local and parallel on the GPU, avoiding the breakdown that occurs when prior cache-based methods encounter quantitative bitvector outputs instead of boolean behaviors. Experiments demonstrate clear speedups over existing CPU tools along with the ability to process larger specifications and produce bigger expressions that other approaches cannot reach. The work targets practical needs in deobfuscation, compiler optimization, reverse engineering, and cryptanalysis where scalable synthesis from input-output examples is required.

Core claim

SIMBA performs synthesis of mixed-boolean arithmetic expressions from input-output examples using a cache-free bottom-up enumeration strategy that is explicitly designed for GPU execution, keeping all work local and highly parallel without storing observationally equivalent candidates.

What carries the argument

Cache-free bottom-up enumeration on GPU, which enumerates candidate expressions directly without language caches or equivalence checks that fail for quantitative bitvector outputs.

If this is right

  • Substantially faster synthesis than prior CPU-based or cache-dependent MBA tools on the same specifications.
  • Successful handling of larger input-output specifications that overwhelm existing methods.
  • Ability to reach expression sizes that cause prior synthesizers to time out or fail entirely.
  • Cache-free GPU synthesis becomes a practical alternative for other quantitative domains beyond pure boolean settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local-parallel enumeration pattern could transfer to synthesis tasks involving other quantitative outputs such as floating-point or cryptographic values.
  • Deobfuscation pipelines might incorporate this approach to tackle more complex real-world obfuscators that produce larger MBA expressions.
  • Hybrid systems could combine the GPU enumerator with lightweight CPU post-processing to further reduce overall runtime on mixed workloads.
  • The results suggest that hardware-specific redesigns of enumeration, rather than caching, deserve consideration in other program synthesis settings.

Load-bearing premise

The chosen benchmarks and expression sizes are representative of real-world deobfuscation and optimization tasks, and GPU enumeration stays efficient without hidden bottlenecks when scaling beyond the tested sizes.

What would settle it

Apply SIMBA to a new set of benchmarks containing twice as many input-output pairs or targeting expressions one size larger than the largest solved in the paper, then measure whether runtime and success rate maintain the reported advantages or degrade sharply.

Figures

Figures reproduced from arXiv: 2605.08243 by Baptiste Mouillon, Gabriel Bathie, Nathana\"el Fijalkow.

Figure 1
Figure 1. Figure 1: Distributions of formula size and number of variables across the benchmark suite. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Percentage of instances solved by each algorithm across varying formula sizes. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the solving time of SIMBA-RTID and SIMBA on each instance. Each circle represents an instance; its x-coordinate is the solving time for SIMBA-RTID and its y-coordinate the time for SIMBA. The lines x = 65 and y = 65 correspond to timeouts of SIMBA-RTID and SIMBA, respectively. Most instances solved by both programs lie on the y = x/4 line, which means that SIMBA is 4 times faster than SIMBA-R… view at source ↗
Figure 4
Figure 4. Figure 4: Percentage of instances solved by each algorithm across varying numbers of variables. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Number of instances solved under each time threshold (H100). [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Number of instances solved under each time threshold (A100). [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Number of instances solved by SIMBA under each time threshold on H100 GPU, averaged over 5 independent runs. The shaded area shows ± standard deviation. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Synthesizing Mixed-Boolean Arithmetic (MBA) expressions from input-output examples is central to program deobfuscation and also useful for compiler optimization, reverse engineering, and cryptanalysis. Existing MBA synthesizers are typically CPU-based and scale poorly on large specifications or complex targets. Recent GPU-accelerated synthesis methods achieve large speedups in qualitative settings, but they depend on caching observationally equivalent candidates; this strategy breaks down for MBA because candidate outputs are quantitative bitvectors and the behavioral space is enormous. We present SIMBA (Synthesis of Mixed-Boolean Arithmetic), a GPU-accelerated MBA synthesizer built around cache-free bottom-up enumeration. SIMBA avoids language caches entirely and uses a GPU-oriented enumeration design that keeps work local and highly parallel. In experiments, SIMBA is substantially faster than prior MBA synthesis tools, handles larger specifications, and reaches expression sizes that existing methods fail to solve. These results establish cache-free GPU synthesis as a practical and scalable approach for quantitative domains, and identify it as a strong alternative to cache-centric designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents SIMBA, a GPU-accelerated synthesizer for Mixed-Boolean Arithmetic (MBA) expressions from input-output examples. It proposes a cache-free bottom-up enumeration strategy optimized for GPU parallelism, arguing that caching fails for MBA because candidate outputs are quantitative bitvectors with an enormous behavioral space. The central empirical claim is that SIMBA is substantially faster than prior MBA tools, handles larger specifications, and reaches expression sizes that existing methods cannot solve.

Significance. If the results hold, the work is significant for establishing cache-free GPU enumeration as a practical approach for quantitative synthesis domains where cache-centric designs break down. It provides direct experimental comparisons supporting speedups and extended reach, with strengths in the GPU-oriented design that keeps work local and parallel, and the absence of reliance on fitted parameters or circular derivations. This could advance applications in program deobfuscation, compiler optimization, reverse engineering, and cryptanalysis.

major comments (1)
  1. [Experiments] Experimental evaluation: The performance claims rest on direct comparisons showing speedups and scalability, but the manuscript provides insufficient detail on benchmark selection criteria and does not report run-to-run variance or statistical measures in the timing results. This limits assessment of whether the tested cases are representative of real-world tasks and whether the GPU enumeration scales without hidden bottlenecks.
minor comments (1)
  1. [Abstract] The abstract states that SIMBA reaches 'expression sizes that existing methods fail to solve' but does not reference a specific table or example showing the failure threshold; adding this would improve clarity of the reach claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. The feedback on experimental details is constructive, and we address it directly below by committing to clarifications and additions in the revised manuscript.

read point-by-point responses
  1. Referee: Experimental evaluation: The performance claims rest on direct comparisons showing speedups and scalability, but the manuscript provides insufficient detail on benchmark selection criteria and does not report run-to-run variance or statistical measures in the timing results. This limits assessment of whether the tested cases are representative of real-world tasks and whether the GPU enumeration scales without hidden bottlenecks.

    Authors: We agree that more detail on benchmark selection and statistical reporting would improve clarity. In the revision we will add an explicit subsection describing the benchmark selection criteria, including how the input-output specifications were derived from real-world deobfuscation, compiler optimization, and cryptanalysis tasks to ensure representativeness. Although the core enumeration algorithm is deterministic, GPU kernel scheduling can introduce small timing variations; we will re-run all timing experiments five times per benchmark, report means with standard deviations in the tables, and add a short discussion of observed scalability limits and potential bottlenecks. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; claims rest on independent algorithmic design and empirical validation

full rationale

The paper introduces SIMBA as a new GPU-oriented, cache-free bottom-up enumeration algorithm for MBA synthesis, motivated by the breakdown of caching for quantitative bitvector outputs. No derivation chain, equations, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or renamed known patterns. The performance claims (faster runtimes, larger solvable specifications) are grounded in direct experimental comparisons rather than any predictive model or uniqueness theorem imported from prior self-work. The design choices are explicitly justified by domain characteristics without circular reduction, making the contribution self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard assumptions about GPU parallelism and enumeration correctness but introduces no new free parameters, axioms beyond domain-standard ones, or invented entities.

axioms (2)
  • domain assumption Bottom-up enumeration generates all expressions up to a given size without duplicates when using appropriate pruning.
    Invoked in the description of the cache-free design; standard in synthesis literature but not re-proven here.
  • domain assumption GPU kernels can be organized so that work remains local and highly parallel without global synchronization overhead for this enumeration pattern.
    Central to the GPU-oriented design claim.

pith-pipeline@v0.9.0 · 5488 in / 1266 out tokens · 36286 ms · 2026-05-12T00:45:35.697666+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

  1. [1]

    Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax- guided synthesis. InFormal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1–8. IEEE,

  2. [2]

    Austin Appleby

    URL https://ieeexplore.ieee.org/ document/6679385/. Austin Appleby. Murmurhash3 and smhasher. https://github.com/aappleby/smhasher,

  3. [3]

    Vidal Attias, Nicolas Bellec, Grégoire Menguy, Sébastien Bardin, and Jean-Yves Marion

    Accessed: 2026-04-24. Vidal Attias, Nicolas Bellec, Grégoire Menguy, Sébastien Bardin, and Jean-Yves Marion. Augmenting search-based program synthesis with local inference rules to improve black-box deobfuscation. In Chun-Ying Huang, Jyh-Cheng Chen, Shiuh-Pyng Shieh, David Lie, and Véronique Cortier, editors,Proceedings of the 2025 ACM SIGSAC Conference o...

  4. [4]

    In: Huang, C., Chen, J., Shieh, S., Lie, D., Cortier, V

    doi: 10.1145/3719027.3765134. URLhttps://doi.org/10.1145/3719027.3765134. Sorav Bansal and Alex Aiken. Automatic generation of peephole superoptimizers. In John Paul Shen and Margaret Martonosi, editors,Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA,...

  5. [5]

    URLhttps://doi.org/10.1145/1168857.1168906

    doi: 10.1145/1168857.1168906. URLhttps://doi.org/10.1145/1168857.1168906. Haniel Barbosa, Clark W. Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-...

  6. [6]

    URL https://doi.org/10.1007/978-3-030-99524-9_24

    doi: 10.1007/978-3-030-99524-9\_24. URL https://doi.org/10.1007/978-3-030-99524-9_24. Martin Berger, Nathanaël Fijalkow, and Mojtaba Valizadeh. GPU accelerated program synthesis: Enumerate semantics, not syntax!CoRR, abs/2504.18943,

  7. [7]

    doi: 10.48550/ARXIV .2504. 18943. URLhttps://doi.org/10.48550/arXiv.2504.18943. Sandrine Blazy and Rémi Hutin. Formal verification of a program obfuscation based on mixed boolean-arithmetic expressions. In Assia Mahboubi and Magnus O. Myreen, editors,Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Ca...

  8. [8]

    doi: 10.1145/3293880. 3294103. URLhttps://doi.org/10.1145/3293880.3294103. Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. Syntia: Synthesiz- ing the semantics of obfuscated code. In Engin Kirda and Thomas Ristenpart, editors,26th USENIX Security Symposium, USENIX Security 2017, Vancouver, BC, Canada, August 16-18, 2017, pages 643–65...

  9. [9]

    NeuReduce: Reducing mixed boolean-arithmetic expressions by recurrent neural network

    Weijie Feng, Binbin Liu, Dongpeng Xu, Qilong Zheng, and Yun Xu. NeuReduce: Reducing mixed boolean-arithmetic expressions by recurrent neural network. In Trevor Cohn, Yulan He, and Yang Liu, editors,Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Findings of ACL, pages 635–644. Association for Compu...

  10. [10]

    URL https: //doi.org/10.18653/v1/2020.findings-emnlp.56

    doi: 10.18653/V1/2020.FINDINGS-EMNLP.56. URL https: //doi.org/10.18653/v1/2020.findings-emnlp.56. 10 Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh.Program Synthesis. Now Publishers,

  11. [11]

    Simplifying mixed boolean-arithmetic obfuscation by program synthesis and term rewriting

    Jaehyung Lee and Woosuk Lee. Simplifying mixed boolean-arithmetic obfuscation by program synthesis and term rewriting. In Weizhi Meng, Christian Damsgaard Jensen, Cas Cremers, and Engin Kirda, editors,Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, pages 2351–2365. ACM,

  12. [12]

    URL https://doi.org/10.1145/ 3576915.3623186

    doi: 10.1145/3576915.3623186. URL https://doi.org/10.1145/ 3576915.3623186. Binbin Liu, Wenying Feng, Qilong Zheng, Jing Li, and Dongpeng Xu. Software obfuscation with non-linear mixed Boolean-arithmetic expressions. InInternational Conference on Information Security and Cryptology (ICISC), pages 276–296. Springer, 2021a. Binbin Liu, Junfu Shen, Jiang Min...

  13. [13]

    URL https://doi.org/10.1145/ 3460120.3485250

    doi: 10.1145/3460120.3485250. URL https://doi.org/10.1145/ 3460120.3485250. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scalable parallel programming with CUDA.ACM Queue, 6(2):40–53,

  14. [14]

    URL https: //doi.org/10.1145/1365490.1365500

    doi: 10.1145/1365490.1365500. URL https: //doi.org/10.1145/1365490.1365500. Benjamin Reichenwallner and Peter Meerwald-Stadler. Efficient deobfuscation of linear mixed boolean-arithmetic expressions. In Golden G. Richard III and Tim Blazytko, editors,Proceedings of the 2022 ACM Workshop on Research on offensive and defensive techniques in the context of M...

  15. [15]

    URLhttps://doi.org/10.1145/3560831.3564256

    doi: 10.1145/3560831.3564256. URLhttps://doi.org/10.1145/3560831.3564256. Benjamin Reichenwallner and Peter Meerwald-Stadler. Simplification of general mixed boolean- arithmetic expressions: GAMBA. InIEEE European Symposium on Security and Privacy, Eu- roS&P 2023 - Workshops, Delft, Netherlands, July 3-7, 2023, pages 427–438. IEEE,

  16. [16]

    Get a higher return on your savings!

    doi: 10.1109/EUROSPW59978.2023.00053. URL https://doi.org/10.1109/EuroSPW59978. 2023.00053. Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Gratian Lup, Jubi Taneja, and John Regehr. Souper: A synthesizing superoptimizer.arXiv preprint arXiv:1711.04422,

  17. [17]

    Deshmukh, Sela Mador-Haim, Milo M

    Abhishek Udupa, Arun Raghavan, Jyotirmoy V . Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. TRANSIT: specifying protocols with concolic snippets. In Hans-Juergen Boehm and Cormac Flanagan, editors,ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013, pages 287–296. ACM,

  18. [18]

    Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines,

    doi: 10.1145/2491956.2462174. URLhttps://doi.org/10.1145/2491956.2462174. Mojtaba Valizadeh and Martin Berger. Search-based regular expression inference on a GPU.Proc. ACM Program. Lang., 7(PLDI):1317–1339,

  19. [19]

    URL https://doi

    doi: 10.1145/3591274. URL https://doi. org/10.1145/3591274. Mojtaba Valizadeh, Nathanaël Fijalkow, and Martin Berger. LTL learning on GPUs. In Arie Gurfinkel and Vijay Ganesh, editors,Computer Aided Verification - 36th International Conference, CAV 2024, Montreal, QC, Canada, July 24-27, 2024, Proceedings, Part III, Lecture Notes in Computer Science, page...

  20. [20]

    URL https://doi.org/10.1007/978-3-031-65633-0_10

    doi: 10.1007/978-3-031-65633-0\_10. URL https://doi.org/10.1007/978-3-031-65633-0_10. Hui Wei, Ting Liu, Shuai Wang, and Chengyu Hu. Unifying mixed Boolean-arithmetic obfuscation by architectural and anti-generalization hardening.Computers & Security,

  21. [21]

    Information hiding in software with mixed boolean-arithmetic transforms

    11 Yongxin Zhou, Alec Main, Yuan Xiang Gu, and Harold Johnson. Information hiding in software with mixed boolean-arithmetic transforms. In Sehun Kim, Moti Yung, and Hyung-Woo Lee, editors,Information Security Applications, 8th International Workshop, WISA 2007, Jeju Island, Korea, August 27-29, 2007, Revised Selected Papers, Lecture Notes in Computer Scie...

  22. [22]

    URL https://doi.org/10.1007/ 978-3-540-77535-5_5

    doi: 10.1007/978-3-540-77535-5\_5. URL https://doi.org/10.1007/ 978-3-540-77535-5_5. A Comparison with Existing Tools 1 2 3 4 5 6 7 8 9 10 Number of variables 0 20 40 60 80 100Instances solved (%) Algorithm SIMBA VFB cvc5 Figure 4: Percentage of instances solved by each algorithm across varying numbers of variables. B Cache Size and Efficiency Tables The ...