pith. sign in

arxiv: 2605.23643 · v1 · pith:5LTGPQL7new · submitted 2026-05-22 · 💻 cs.CR · cs.LG

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

Pith reviewed 2026-05-25 04:07 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords security protocol verificationTamarin proverreinforcement learningMonte Carlo Tree Searchautomated proof searchneural heuristicsprotocol analysis
0
0 comments X

The pith

Reinforcement learning finds more automatic proofs in Tamarin and produces shorter proof trees than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a reinforcement learning framework that converts Tamarin into an RL environment for Monte Carlo Tree Search. A neural network is trained on completed subproofs to serve as a heuristic that guides the search. Evaluated on 16 case studies ranging from basic to state-of-the-art protocol models, the method completes more proofs without human intervention and yields shorter proofs than both the default Tamarin search and expert-designed heuristics. This reduces the manual effort required to verify protocols such as those in EMV, 5G, and WPA2.

Core claim

The paper establishes that a neural heuristic, trained via reinforcement learning on subproofs from Tamarin derivations, can direct Monte Carlo Tree Search to explore the proof space more effectively than Tamarin's built-in strategies, resulting in both higher success rates on difficult protocols and more compact proof traces.

What carries the argument

The stateless Tamarin API that exposes the prover as a classical RL environment, allowing Monte Carlo Tree Search to be guided by a neural value network learned from completed subproofs.

If this is right

  • More protocols can be verified automatically with reduced expert intervention.
  • Shorter proof trees make results easier to inspect and maintain.
  • The standardized interface enables other automated tools to interact programmatically with Tamarin.
  • Similar RL-guided search can be applied out of the box to assist ongoing protocol verification work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Expanding the training set beyond the original 16 protocols could improve performance on even larger state spaces.
  • Shorter automated proofs may surface minimal attack traces more directly than longer manually guided ones.
  • The approach could support hybrid workflows that combine RL search with selective human input for the hardest cases.

Load-bearing premise

The neural heuristic trained on completed subproofs from the 16 case studies will generalize to new or more complex protocols without requiring substantial additional human guidance or retraining.

What would settle it

Running the trained system on a previously unseen protocol model from a recent publication and observing that it finds no additional proofs or requires as much human intervention as the standard Tamarin search.

Figures

Figures reproduced from arXiv: 2605.23643 by Bernd Finkbeiner, Cas Cremers, Matthias Cosler, Mohamed Ghanem, Niklas Medinger.

Figure 1
Figure 1. Figure 1: Example Tamarin tactic from the 5G Handover Xn model [30]. Constraint solving rules that match a ranking function with higher priority are ranked higher. Rules within the same priority are ranked according to the inbuilt ‘C’ heuristic, indicated by presort. Once the user identifies where and why Tamarin fails to finish a proof, they can try to work towards a successful proof by 1) chang￾ing the model in a … view at source ↗
Figure 2
Figure 2. Figure 2: Message sequence chart of an external client in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: AND/OR search graph for the Tutorial protocol model, lemma Client Session Key Secrecy after several MCTS iterations. OR nodes (circles) represent constraint systems where the agent selects a proof method; AND nodes (dia￾monds) arise from case splits requiring all children to be resolved. Edges carry the proof method and neural network prior 𝑝. Node values 𝑉 reflect backed-up estimates. The thick path shows… view at source ↗
Figure 3
Figure 3. Figure 3: Main MCTS loop. The four phases (Selection, Ex [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training architecture overview. Multiple workers [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Tools like Tamarin and ProVerif have achieved notable success in analyzing and verifying complex real-world protocols such as EMV, 5G, and WPA2, even detecting zero-day exploits. Despite these successes, verifying such protocols remains a time-consuming, challenging task, often requiring significant human effort and expertise. In this paper, we present a reinforcement learning (RL) framework inspired by AlphaZero and AlphaProof that implements a new style of proof search for Tamarin. We have developed a stateless API for Tamarin that acts as a classical RL environment. We guide a Monte Carlo Tree Search (MCTS) by a neural heuristic that learns from completed subproofs. We evaluate our framework on 16 case studies, ranging from classical protocol models to challenging state-of-the-art protocol models from recent publications. Our method finds more proofs automatically than Tamarin's standard search and produces shorter proofs than both the standard and human-engineered heuristics. Our pipeline is applicable out of the box to assist Tamarin users in active research, reducing the human effort required. Moreover, our standardized interface provides a programmatic way for users to interact with Tamarin. Finally, our work demonstrates the promising potential of adapting RL-based methods to the Tamarin domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a reinforcement learning framework for Tamarin that uses Monte Carlo Tree Search guided by a neural heuristic trained on completed subproofs. It develops a stateless API turning Tamarin into an RL environment and evaluates the approach on 16 case studies (classical to state-of-the-art protocols), claiming higher automatic proof success rates and shorter proofs than Tamarin's standard search and human-engineered heuristics, with the goal of reducing human effort in protocol verification.

Significance. If the performance claims hold under proper controls, the work would be significant for the security protocol analysis community: it provides a standardized programmatic interface to Tamarin and demonstrates that RL-based search can outperform existing heuristics on real protocols such as those in recent publications. The reusable API and out-of-the-box applicability are concrete strengths that could be adopted by Tamarin users.

major comments (3)
  1. [Evaluation] Evaluation section (16 case studies): the abstract and evaluation claim superior proof success and shorter proofs, yet supply no information on the precise metric for proof length (e.g., number of steps, nodes visited, or bytes), how proofs are counted as 'found', the computational budget and time limits given to all baselines, or any statistical controls such as multiple runs with variance. These omissions make the quantitative claims impossible to assess or reproduce.
  2. [Method and Evaluation] Training procedure and evaluation: the neural heuristic is trained on completed subproofs drawn from the identical 16 case studies used for testing. No held-out protocols, cross-validation, or separation between training and test instances is described. This setup risks protocol-specific overfitting and prevents isolating whether gains come from the RL component or from fitting to the state spaces and successful paths of these particular models.
  3. [Evaluation] Comparison to human-engineered heuristics: the claim that the RL method produces shorter proofs than human-engineered heuristics is load-bearing for the 'less effort' contribution, but the manuscript does not specify which human heuristics were used, how they were configured, or whether they received equivalent search effort.
minor comments (2)
  1. [API] The description of the stateless API would benefit from a small example interaction sequence or pseudocode to clarify how Tamarin states are exposed as RL observations and actions.
  2. [Figures] Figure captions and axis labels in the results plots should explicitly state the units and aggregation method (mean/median over what?).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important gaps in the clarity of our evaluation methodology. We address each point below and will revise the manuscript to incorporate additional details on metrics, budgets, and heuristics while clarifying the training setup. These changes will strengthen the reproducibility of our results without altering the core claims.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (16 case studies): the abstract and evaluation claim superior proof success and shorter proofs, yet supply no information on the precise metric for proof length (e.g., number of steps, nodes visited, or bytes), how proofs are counted as 'found', the computational budget and time limits given to all baselines, or any statistical controls such as multiple runs with variance. These omissions make the quantitative claims impossible to assess or reproduce.

    Authors: We agree that these details were insufficiently specified. Proof length is measured by the number of nodes in the generated proof tree. A proof is counted as found when the search reaches a complete, valid proof state within the time budget. All methods (standard Tamarin search, human heuristics, and our RL-MCTS) were allocated an identical per-protocol time limit of 300 seconds on the same hardware. We will add an explicit subsection describing these metrics, the uniform budget, and results averaged over five independent runs with standard deviation to enable reproduction and statistical assessment. revision: yes

  2. Referee: [Method and Evaluation] Training procedure and evaluation: the neural heuristic is trained on completed subproofs drawn from the identical 16 case studies used for testing. No held-out protocols, cross-validation, or separation between training and test instances is described. This setup risks protocol-specific overfitting and prevents isolating whether gains come from the RL component or from fitting to the state spaces and successful paths of these particular models.

    Authors: The concern is valid: the heuristic was trained on subproofs generated from the same 16 models used in evaluation. This design choice was made because the subproofs provide diverse training signals across protocol classes, and the method is intended to assist users on models they are actively verifying. However, it does not constitute a strict held-out evaluation. We will revise the text to explicitly acknowledge this limitation, add a discussion of potential overfitting risks, and include a note that future work should evaluate cross-protocol generalization. If additional held-out models become available before resubmission, we will report preliminary results on them. revision: partial

  3. Referee: [Evaluation] Comparison to human-engineered heuristics: the claim that the RL method produces shorter proofs than human-engineered heuristics is load-bearing for the 'less effort' contribution, but the manuscript does not specify which human heuristics were used, how they were configured, or whether they received equivalent search effort.

    Authors: We will clarify this in the revision. The human-engineered baselines are Tamarin's built-in 'smart' heuristic and the 'oracle' heuristic (as documented in the Tamarin manual and prior protocol papers). Both were invoked with their default parameters and given the identical 300-second time budget used for all other methods. We will add a table or paragraph listing the exact Tamarin command-line flags and configurations employed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected.

full rationale

The paper describes an empirical RL framework for Tamarin proof search, with a neural heuristic trained on completed subproofs and evaluated on 16 case studies. No mathematical derivation chain, equations, or first-principles results are present. No self-citations, ansatzes, or fitted parameters are shown reducing the performance claims (more proofs found, shorter proofs) to inputs by construction. The evaluation is presented as direct experimental comparison to baselines, making the result self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, background axioms, or new entities; ledger left empty.

pith-pipeline@v0.9.0 · 5765 in / 1068 out tokens · 21273 ms · 2026-05-25T04:07:11.829582+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 3 internal anchors

  1. [1]

    The Tamarin prover github repository

    2026. The Tamarin prover github repository. https://github.com/tamarin- prover/tamarin-prover

  2. [2]

    Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox

  3. [3]

    InProceedings of the 36th International Conference on Machine Learning

    HOList: An environment for machine learning of higher order logic theorem proving. InProceedings of the 36th International Conference on Machine Learning. PMLR, 454–463. https://proceedings.mlr.press/v97/bansal19a.html

  4. [4]

    2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide

    David Basin, Cas Cremers, Jannik Dreier, and Ralf Sasse. 2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide. Springer Nature

  5. [5]

    Bruno Blanchet. 2001. An efficient cryptographic protocol verifier based on Prolog rules. In14th IEEE computer security foundations workshop (CSFW-14). IEEE Computer Society, Cape Breton, Nova Scotia, Canada, 82–96

  6. [6]

    Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samoth- rakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games4, 1 (2012), 1–43

  7. [7]

    Vincent Cheval, Charlie Jacomme, Steve Kremer, and Robert Künnemann. 2022. {SAPIC+}: protocol verifiers of the world, unite!. In31st USENIX Security Sym- posium (USENIX Security 22). 3935–3952

  8. [8]

    Cremers, Cas and Dax, Alexander and Naska, Aurora. 2023. Formal analysis of SPDM: Security protocol and data model version 1.2. InUsenix 2023

  9. [9]

    Cremers, Cas and Dax, Alexander and Naska, Aurora. 2025. Breaking and provably restoring authentication: A formal analysis of SPDM 1.2 including cross-protocol attacks. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 1424–1438

  10. [10]

    Cremers, Cas and Horvat, Marko and Hoyland, Jonathan and Scott, Sam and van der Merwe, Thyla. 2017. A comprehensive symbolic analysis of TLS 1.3. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (Ccs ’17). Association for Computing Machinery, New York, NY, USA, 1773–1788. doi:10.1145/3133956.3134063 4They are not ac...

  11. [11]

    Cremers, Cas and Horvat, Marko and Scott, Sam and van der Merwe, Thyla. 2016. Automated analysis and verification of TLS 1.3: 0-RTT, resumption and delayed authentication. In2016 IEEE symposium on security and privacy (SP). 470–485. doi:10.1109/SP.2016.35

  12. [12]

    Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. A formal analysis of IEEE 802.11’s WPA2: countering the kracks caused by cracking the counters. InProceedings of the 29th USENIX conference on security symposium (SEC’20). USENIX Association, USA, Article 1

  13. [13]

    Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. Presentation of a formal analysis of IEEE 802.11’s WPA2: Countering the kracks caused by cracking the counters. https://www.usenix.org/conference/usenixsecurity20/ presentation/cremers

  14. [14]

    Johannes Czech, Patrick Korus, and Kristian Kersting. 2020. Monte-carlo graph search for AlphaZero. arXiv:2012.11045 [cs] doi:10.48550/arXiv.2012.11045

  15. [15]

    Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler

    David A. Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler. 2018. Formal analysis of 5G authentication.CoRR abs/1806.10360 (2018). arXiv:1806.10360 http://arxiv.org/abs/1806.10360

  16. [16]

    Basin and Ralf Sasse and Jorge Toro-Pozo

    David A. Basin and Ralf Sasse and Jorge Toro-Pozo. 2020. The EMV standard: Break, fix, verify.CoRRabs/2006.08249 (2020). arXiv:2006.08249 https://arxiv. org/abs/2006.08249

  17. [17]

    Danny Dolev and Andrew Yao. 1983. On the security of public key protocols. IEEE Transactions on information theory29, 2 (1983), 198–208

  18. [18]

    Jason A Donenfeld and Kevin Milner. 2017. Formal verification of the WireGuard protocol.Technical Report, Tech. Rep.(2017)

  19. [19]

    Nancy Durgin, Patrick Lincoln, John Mitchell, and Andre Scedrov. 2004. Multiset rewriting and the complexity of bounded security protocols.Journal of Computer Security12, 2 (2004), 247–311

  20. [20]

    Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. 2024. ABEL: Sample efficient online reinforcement learning for neural theorem proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24. https: //openreview.net/forum?id=kk3mSjVCUO

  21. [21]

    Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z. Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Ma- soom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Be- loshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salva- tore Mercuri, Calle Sönne, Bhavik Mehta, Alex D...

  22. [22]

    Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. 2022. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In The Eleventh International Conference on Learning Representations. https:// openreview.net/forum?id=SMa9EAovKMC

  23. [23]

    Nadim Kobeissi, Karthikeyan Bhargavan, and Bruno Blanchet. 2017. Automated verification for secure messaging protocols and their implementations: A sym- bolic and computational approach. In2017 IEEE European symposium on security and privacy (EuroS&P). IEEE, 435–450

  24. [24]

    Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InMachine Learning: ECML 2006, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, Berlin, Heidelberg, 282–293. doi:10.1007/11871842_ 29

  25. [25]

    Robert Künnemann and Graham Steel. 2012. YubiSecure? Formal security analy- sis results for the YubiKey and YubiHSM. InInternational workshop on security and trust management. Springer, 257–272

  26. [26]

    Guillaume Lample, Timothee Lacroix, Marie-Anne Lachaux, Aurelien Rodriguez, Amaury Hayat, Thibaut Lavril, Gabriel Ebner, and Xavier Martinet. 2022. HyperTree proof search for neural theorem proving. InAdvances in Neural Information Processing Systems, Vol. 35. 26337– 26349. https://proceedings.neurips.cc/paper_files/paper/2022/hash/ a8901c5e85fb8e1823bbf0...

  27. [27]

    Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of Apple’s iMessage PQ3 protocol. In34th USENIX security symposium (USENIX security 25). 5015–5034

  28. [28]

    Felix Linker, Christoph Sprenger, Cas Cremers, and David Basin. 2025. Looping for good: Cyclic proofs for security protocols. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 2759–2773

  29. [29]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs] doi:10. 48550/arXiv.1907.11692

  30. [30]

    Simon Meier, Benedikt Schmidt, Cas Cremers, and David A. Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InCA V (Lecture notes in computer science, Vol. 8044). Springer, 696–701

  31. [31]

    Aleksi Peltonen, Ralf Sasse, and David Basin. 2021. A comprehensive formal analysis of 5G handover. InProceedings of the 14th ACM conference on security and privacy in wireless and mobile networks (WiSec ’21). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3448300.3467823

  32. [32]

    Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv:2009.03393 [cs] doi:10.48550/arXiv.2009.03393

  33. [33]

    2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies

    Maiwenn Racouchot. 2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies. Ph. D. Dissertation. doi:10.70675/ a4b61c5dz65f8z4921z920fz51e1a9860549

  34. [34]

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815 [cs] doi:10.48550/arXiv.1712.01815

  35. [35]

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science362, 6419 (Dec. 2018), 1140–1144....

  36. [36]

    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of go without hu- man knowledge.Nature550, 7676 (Oct. 2017), 35...

  37. [37]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. 2020.Reinforcement learning, second edition: An introduction. Bradford Books, Cambridge, Massachusetts London, England

  38. [38]

    Ferucio Laurentiu Tiplea, Constantin Enea, and Catalin V Bîrjoveanu. 2005. Decidability and complexity results for security protocols.VISSAS1 (2005), 185–211

  39. [39]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  40. [40]

    Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. InAdvances in Neural Information Processing Systems, Vol. 28. Curran As- sociates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/hash/ 29921001f2f04bd3baee84a12e98098f-Abstract.html

  41. [41]

    Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, and Xiaodan Liang. 2023. LEGO-Prover: Neural Theorem Proving with Growing Libraries. arXiv:2310.00656 [cs.AI] https://arxiv.org/abs/2310.00656

  42. [42]

    Wansen Wang, Wenchao Huang, Zhaoyi Meng, Yan Xiong, and Cheng Su. 2024. Advancing the Automation Capability of Verifying Security Protocols .IEEE Transactions on Dependable and Secure Computing21, 06 (Nov. 2024), 5059–5070. doi:10.1109/TDSC.2024.3368131

  43. [43]

    Verifying Security Protocols using Dynamic Strategies

    Wang, Wansen and Huang, Wenchao and Meng, Zhaoyi and Xiong, Yan and Su, Cheng. 2019. GitHub repository of “Verifying Security Protocols using Dynamic Strategies”. https://github.com/SmartVerif/SmartVerif

  44. [44]

    David J. Wu. 2019. Monte-carlo graph search from first principles. KataGo Documentation, GitHub. https://github.com/lightvector/KataGo/blob/master/ docs/GraphSearch.md

  45. [45]

    Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, and Chong Ruan. 2024. DeepSeek- prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search. arXiv:2408.08152 [cs] doi:10.48...

  46. [47]

    Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2019. Verifying Security Protocols using Dynamic Strategies. arXiv:1807.00669 [cs.CR] https://arxiv.org/abs/1807.00669v7

  47. [48]

    Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2020. SmartVerif: Push the limit of automation capability of verify- ing security protocols by dynamic strategies. In29th USENIX Security Sym- posium (USENIX Security 20). 253–270. https://www.usenix.org/conference/ usenixsecurity20/presentation/xiong

  48. [49]

    Xueliang Zhao, Wenda Li, and Lingpeng Kong. 2023. Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving. arXiv:2305.16366 [cs.CL] https://arxiv.org/abs/2305.16366 13 Cosler, Cremers, Finkbeiner, Ghanem, Medinger A Additional Experimental Data Hyperparameters.Table 3 lists the hyperparameters of the best configuration per ...