Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

Bernd Finkbeiner; Cas Cremers; Matthias Cosler; Mohamed Ghanem; Niklas Medinger

arxiv: 2605.23643 · v1 · pith:5LTGPQL7new · submitted 2026-05-22 · 💻 cs.CR · cs.LG

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

Matthias Cosler , Cas Cremers , Bernd Finkbeiner , Mohamed Ghanem , Niklas Medinger This is my paper

Pith reviewed 2026-05-25 04:07 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords security protocol verificationTamarin proverreinforcement learningMonte Carlo Tree Searchautomated proof searchneural heuristicsprotocol analysis

0 comments

The pith

Reinforcement learning finds more automatic proofs in Tamarin and produces shorter proof trees than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a reinforcement learning framework that converts Tamarin into an RL environment for Monte Carlo Tree Search. A neural network is trained on completed subproofs to serve as a heuristic that guides the search. Evaluated on 16 case studies ranging from basic to state-of-the-art protocol models, the method completes more proofs without human intervention and yields shorter proofs than both the default Tamarin search and expert-designed heuristics. This reduces the manual effort required to verify protocols such as those in EMV, 5G, and WPA2.

Core claim

The paper establishes that a neural heuristic, trained via reinforcement learning on subproofs from Tamarin derivations, can direct Monte Carlo Tree Search to explore the proof space more effectively than Tamarin's built-in strategies, resulting in both higher success rates on difficult protocols and more compact proof traces.

What carries the argument

The stateless Tamarin API that exposes the prover as a classical RL environment, allowing Monte Carlo Tree Search to be guided by a neural value network learned from completed subproofs.

If this is right

More protocols can be verified automatically with reduced expert intervention.
Shorter proof trees make results easier to inspect and maintain.
The standardized interface enables other automated tools to interact programmatically with Tamarin.
Similar RL-guided search can be applied out of the box to assist ongoing protocol verification work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Expanding the training set beyond the original 16 protocols could improve performance on even larger state spaces.
Shorter automated proofs may surface minimal attack traces more directly than longer manually guided ones.
The approach could support hybrid workflows that combine RL search with selective human input for the hardest cases.

Load-bearing premise

The neural heuristic trained on completed subproofs from the 16 case studies will generalize to new or more complex protocols without requiring substantial additional human guidance or retraining.

What would settle it

Running the trained system on a previously unseen protocol model from a recent publication and observing that it finds no additional proofs or requires as much human intervention as the standard Tamarin search.

Figures

Figures reproduced from arXiv: 2605.23643 by Bernd Finkbeiner, Cas Cremers, Matthias Cosler, Mohamed Ghanem, Niklas Medinger.

**Figure 1.** Figure 1: Example Tamarin tactic from the 5G Handover Xn model [30]. Constraint solving rules that match a ranking function with higher priority are ranked higher. Rules within the same priority are ranked according to the inbuilt ‘C’ heuristic, indicated by presort. Once the user identifies where and why Tamarin fails to finish a proof, they can try to work towards a successful proof by 1) changing the model in a … view at source ↗

**Figure 2.** Figure 2: Message sequence chart of an external client in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: AND/OR search graph for the Tutorial protocol model, lemma Client Session Key Secrecy after several MCTS iterations. OR nodes (circles) represent constraint systems where the agent selects a proof method; AND nodes (diamonds) arise from case splits requiring all children to be resolved. Edges carry the proof method and neural network prior 𝑝. Node values 𝑉 reflect backed-up estimates. The thick path shows… view at source ↗

**Figure 3.** Figure 3: Main MCTS loop. The four phases (Selection, Ex [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Training architecture overview. Multiple workers [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Tools like Tamarin and ProVerif have achieved notable success in analyzing and verifying complex real-world protocols such as EMV, 5G, and WPA2, even detecting zero-day exploits. Despite these successes, verifying such protocols remains a time-consuming, challenging task, often requiring significant human effort and expertise. In this paper, we present a reinforcement learning (RL) framework inspired by AlphaZero and AlphaProof that implements a new style of proof search for Tamarin. We have developed a stateless API for Tamarin that acts as a classical RL environment. We guide a Monte Carlo Tree Search (MCTS) by a neural heuristic that learns from completed subproofs. We evaluate our framework on 16 case studies, ranging from classical protocol models to challenging state-of-the-art protocol models from recent publications. Our method finds more proofs automatically than Tamarin's standard search and produces shorter proofs than both the standard and human-engineered heuristics. Our pipeline is applicable out of the box to assist Tamarin users in active research, reducing the human effort required. Moreover, our standardized interface provides a programmatic way for users to interact with Tamarin. Finally, our work demonstrates the promising potential of adapting RL-based methods to the Tamarin domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RL setup for Tamarin gives measurable gains on 16 cases but the training data overlap with evaluation leaves the source of the improvement unclear.

read the letter

The core of this paper is a new stateless API that turns Tamarin into an RL environment, plus an MCTS search guided by a neural heuristic trained on completed subproofs. On the 16 case studies it reports higher proof success than the built-in search and shorter proofs than both the default and human-written heuristics. The API itself is a concrete addition that lets users script interactions with Tamarin without the usual manual overhead, and the AlphaZero-style pipeline is a fresh way to bring learned guidance into an existing verifier used for protocols like 5G and EMV. That combination is what is actually new here. The implementation looks reproducible enough that others could try the same interface on their own models. The main soft spot is the evaluation setup. The heuristic is trained on subproofs drawn from the identical 16 case studies used to measure performance, so it is not obvious whether the reported improvements come from general tactics or from fitting to the state spaces and successful paths in those specific models. The abstract gives no numbers on how proof length was measured, whether baselines ran under matched compute, or any cross-validation, which makes the comparison harder to interpret. Minor details like that can be fixed, but they matter for a claim that rests on outperforming the non-learning baseline. This paper is for people who already run Tamarin on real protocols and want to cut down on manual proof search. A reader working in formal security analysis would get practical value from the API and the reported numbers even if the generalization story needs tightening. It is worth sending to peer review because the engineering contribution is solid and the idea is testable; the experiments just need clearer separation of training and test data to make the gains convincing.

Referee Report

3 major / 2 minor

Summary. The paper introduces a reinforcement learning framework for Tamarin that uses Monte Carlo Tree Search guided by a neural heuristic trained on completed subproofs. It develops a stateless API turning Tamarin into an RL environment and evaluates the approach on 16 case studies (classical to state-of-the-art protocols), claiming higher automatic proof success rates and shorter proofs than Tamarin's standard search and human-engineered heuristics, with the goal of reducing human effort in protocol verification.

Significance. If the performance claims hold under proper controls, the work would be significant for the security protocol analysis community: it provides a standardized programmatic interface to Tamarin and demonstrates that RL-based search can outperform existing heuristics on real protocols such as those in recent publications. The reusable API and out-of-the-box applicability are concrete strengths that could be adopted by Tamarin users.

major comments (3)

[Evaluation] Evaluation section (16 case studies): the abstract and evaluation claim superior proof success and shorter proofs, yet supply no information on the precise metric for proof length (e.g., number of steps, nodes visited, or bytes), how proofs are counted as 'found', the computational budget and time limits given to all baselines, or any statistical controls such as multiple runs with variance. These omissions make the quantitative claims impossible to assess or reproduce.
[Method and Evaluation] Training procedure and evaluation: the neural heuristic is trained on completed subproofs drawn from the identical 16 case studies used for testing. No held-out protocols, cross-validation, or separation between training and test instances is described. This setup risks protocol-specific overfitting and prevents isolating whether gains come from the RL component or from fitting to the state spaces and successful paths of these particular models.
[Evaluation] Comparison to human-engineered heuristics: the claim that the RL method produces shorter proofs than human-engineered heuristics is load-bearing for the 'less effort' contribution, but the manuscript does not specify which human heuristics were used, how they were configured, or whether they received equivalent search effort.

minor comments (2)

[API] The description of the stateless API would benefit from a small example interaction sequence or pseudocode to clarify how Tamarin states are exposed as RL observations and actions.
[Figures] Figure captions and axis labels in the results plots should explicitly state the units and aggregation method (mean/median over what?).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important gaps in the clarity of our evaluation methodology. We address each point below and will revise the manuscript to incorporate additional details on metrics, budgets, and heuristics while clarifying the training setup. These changes will strengthen the reproducibility of our results without altering the core claims.

read point-by-point responses

Referee: [Evaluation] Evaluation section (16 case studies): the abstract and evaluation claim superior proof success and shorter proofs, yet supply no information on the precise metric for proof length (e.g., number of steps, nodes visited, or bytes), how proofs are counted as 'found', the computational budget and time limits given to all baselines, or any statistical controls such as multiple runs with variance. These omissions make the quantitative claims impossible to assess or reproduce.

Authors: We agree that these details were insufficiently specified. Proof length is measured by the number of nodes in the generated proof tree. A proof is counted as found when the search reaches a complete, valid proof state within the time budget. All methods (standard Tamarin search, human heuristics, and our RL-MCTS) were allocated an identical per-protocol time limit of 300 seconds on the same hardware. We will add an explicit subsection describing these metrics, the uniform budget, and results averaged over five independent runs with standard deviation to enable reproduction and statistical assessment. revision: yes
Referee: [Method and Evaluation] Training procedure and evaluation: the neural heuristic is trained on completed subproofs drawn from the identical 16 case studies used for testing. No held-out protocols, cross-validation, or separation between training and test instances is described. This setup risks protocol-specific overfitting and prevents isolating whether gains come from the RL component or from fitting to the state spaces and successful paths of these particular models.

Authors: The concern is valid: the heuristic was trained on subproofs generated from the same 16 models used in evaluation. This design choice was made because the subproofs provide diverse training signals across protocol classes, and the method is intended to assist users on models they are actively verifying. However, it does not constitute a strict held-out evaluation. We will revise the text to explicitly acknowledge this limitation, add a discussion of potential overfitting risks, and include a note that future work should evaluate cross-protocol generalization. If additional held-out models become available before resubmission, we will report preliminary results on them. revision: partial
Referee: [Evaluation] Comparison to human-engineered heuristics: the claim that the RL method produces shorter proofs than human-engineered heuristics is load-bearing for the 'less effort' contribution, but the manuscript does not specify which human heuristics were used, how they were configured, or whether they received equivalent search effort.

Authors: We will clarify this in the revision. The human-engineered baselines are Tamarin's built-in 'smart' heuristic and the 'oracle' heuristic (as documented in the Tamarin manual and prior protocol papers). Both were invoked with their default parameters and given the identical 300-second time budget used for all other methods. We will add a table or paragraph listing the exact Tamarin command-line flags and configurations employed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected.

full rationale

The paper describes an empirical RL framework for Tamarin proof search, with a neural heuristic trained on completed subproofs and evaluated on 16 case studies. No mathematical derivation chain, equations, or first-principles results are present. No self-citations, ansatzes, or fitted parameters are shown reducing the performance claims (more proofs found, shorter proofs) to inputs by construction. The evaluation is presented as direct experimental comparison to baselines, making the result self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, background axioms, or new entities; ledger left empty.

pith-pipeline@v0.9.0 · 5765 in / 1068 out tokens · 21273 ms · 2026-05-25T04:07:11.829582+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 3 internal anchors

[1]

The Tamarin prover github repository

2026. The Tamarin prover github repository. https://github.com/tamarin- prover/tamarin-prover

work page 2026
[2]

Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox

work page
[3]

InProceedings of the 36th International Conference on Machine Learning

HOList: An environment for machine learning of higher order logic theorem proving. InProceedings of the 36th International Conference on Machine Learning. PMLR, 454–463. https://proceedings.mlr.press/v97/bansal19a.html

work page
[4]

2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide

David Basin, Cas Cremers, Jannik Dreier, and Ralf Sasse. 2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide. Springer Nature

work page 2025
[5]

Bruno Blanchet. 2001. An efficient cryptographic protocol verifier based on Prolog rules. In14th IEEE computer security foundations workshop (CSFW-14). IEEE Computer Society, Cape Breton, Nova Scotia, Canada, 82–96

work page 2001
[6]

Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samoth- rakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games4, 1 (2012), 1–43

work page 2012
[7]

Vincent Cheval, Charlie Jacomme, Steve Kremer, and Robert Künnemann. 2022. {SAPIC+}: protocol verifiers of the world, unite!. In31st USENIX Security Sym- posium (USENIX Security 22). 3935–3952

work page 2022
[8]

Cremers, Cas and Dax, Alexander and Naska, Aurora. 2023. Formal analysis of SPDM: Security protocol and data model version 1.2. InUsenix 2023

work page 2023
[9]

Cremers, Cas and Dax, Alexander and Naska, Aurora. 2025. Breaking and provably restoring authentication: A formal analysis of SPDM 1.2 including cross-protocol attacks. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 1424–1438

work page 2025
[10]

Cremers, Cas and Horvat, Marko and Hoyland, Jonathan and Scott, Sam and van der Merwe, Thyla. 2017. A comprehensive symbolic analysis of TLS 1.3. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (Ccs ’17). Association for Computing Machinery, New York, NY, USA, 1773–1788. doi:10.1145/3133956.3134063 4They are not ac...

work page doi:10.1145/3133956.3134063 2017
[11]

Cremers, Cas and Horvat, Marko and Scott, Sam and van der Merwe, Thyla. 2016. Automated analysis and verification of TLS 1.3: 0-RTT, resumption and delayed authentication. In2016 IEEE symposium on security and privacy (SP). 470–485. doi:10.1109/SP.2016.35

work page doi:10.1109/sp.2016.35 2016
[12]

Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. A formal analysis of IEEE 802.11’s WPA2: countering the kracks caused by cracking the counters. InProceedings of the 29th USENIX conference on security symposium (SEC’20). USENIX Association, USA, Article 1

work page 2020
[13]

Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. Presentation of a formal analysis of IEEE 802.11’s WPA2: Countering the kracks caused by cracking the counters. https://www.usenix.org/conference/usenixsecurity20/ presentation/cremers

work page 2020
[14]

Johannes Czech, Patrick Korus, and Kristian Kersting. 2020. Monte-carlo graph search for AlphaZero. arXiv:2012.11045 [cs] doi:10.48550/arXiv.2012.11045

work page doi:10.48550/arxiv.2012.11045 2020
[15]

Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler

David A. Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler. 2018. Formal analysis of 5G authentication.CoRR abs/1806.10360 (2018). arXiv:1806.10360 http://arxiv.org/abs/1806.10360

work page arXiv 2018
[16]

Basin and Ralf Sasse and Jorge Toro-Pozo

David A. Basin and Ralf Sasse and Jorge Toro-Pozo. 2020. The EMV standard: Break, fix, verify.CoRRabs/2006.08249 (2020). arXiv:2006.08249 https://arxiv. org/abs/2006.08249

work page arXiv 2020
[17]

Danny Dolev and Andrew Yao. 1983. On the security of public key protocols. IEEE Transactions on information theory29, 2 (1983), 198–208

work page 1983
[18]

Jason A Donenfeld and Kevin Milner. 2017. Formal verification of the WireGuard protocol.Technical Report, Tech. Rep.(2017)

work page 2017
[19]

Nancy Durgin, Patrick Lincoln, John Mitchell, and Andre Scedrov. 2004. Multiset rewriting and the complexity of bounded security protocols.Journal of Computer Security12, 2 (2004), 247–311

work page 2004
[20]

Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. 2024. ABEL: Sample efficient online reinforcement learning for neural theorem proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24. https: //openreview.net/forum?id=kk3mSjVCUO

work page 2024
[21]

Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z. Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Ma- soom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Be- loshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salva- tore Mercuri, Calle Sönne, Bhavik Mehta, Alex D...

work page doi:10.1038/s41586-025-09833-y 2026
[22]

Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. 2022. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In The Eleventh International Conference on Learning Representations. https:// openreview.net/forum?id=SMa9EAovKMC

work page 2022
[23]

Nadim Kobeissi, Karthikeyan Bhargavan, and Bruno Blanchet. 2017. Automated verification for secure messaging protocols and their implementations: A sym- bolic and computational approach. In2017 IEEE European symposium on security and privacy (EuroS&P). IEEE, 435–450

work page 2017
[24]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InMachine Learning: ECML 2006, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, Berlin, Heidelberg, 282–293. doi:10.1007/11871842_ 29

work page doi:10.1007/11871842_ 2006
[25]

Robert Künnemann and Graham Steel. 2012. YubiSecure? Formal security analy- sis results for the YubiKey and YubiHSM. InInternational workshop on security and trust management. Springer, 257–272

work page 2012
[26]

Guillaume Lample, Timothee Lacroix, Marie-Anne Lachaux, Aurelien Rodriguez, Amaury Hayat, Thibaut Lavril, Gabriel Ebner, and Xavier Martinet. 2022. HyperTree proof search for neural theorem proving. InAdvances in Neural Information Processing Systems, Vol. 35. 26337– 26349. https://proceedings.neurips.cc/paper_files/paper/2022/hash/ a8901c5e85fb8e1823bbf0...

work page 2022
[27]

Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of Apple’s iMessage PQ3 protocol. In34th USENIX security symposium (USENIX security 25). 5015–5034

work page 2025
[28]

Felix Linker, Christoph Sprenger, Cas Cremers, and David Basin. 2025. Looping for good: Cyclic proofs for security protocols. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 2759–2773

work page 2025
[29]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs] doi:10. 48550/arXiv.1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 2019
[30]

Simon Meier, Benedikt Schmidt, Cas Cremers, and David A. Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InCA V (Lecture notes in computer science, Vol. 8044). Springer, 696–701

work page 2013
[31]

Aleksi Peltonen, Ralf Sasse, and David Basin. 2021. A comprehensive formal analysis of 5G handover. InProceedings of the 14th ACM conference on security and privacy in wireless and mobile networks (WiSec ’21). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3448300.3467823

work page doi:10.1145/3448300.3467823 2021
[32]

Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv:2009.03393 [cs] doi:10.48550/arXiv.2009.03393

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2009.03393 2020
[33]

2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies

Maiwenn Racouchot. 2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies. Ph. D. Dissertation. doi:10.70675/ a4b61c5dz65f8z4921z920fz51e1a9860549

work page 2024
[34]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815 [cs] doi:10.48550/arXiv.1712.01815

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.01815 2017
[35]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science362, 6419 (Dec. 2018), 1140–1144....

work page doi:10.1126/science.aar6404 2018
[36]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of go without hu- man knowledge.Nature550, 7676 (Oct. 2017), 35...

work page doi:10.1038/nature24270 2017
[37]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. 2020.Reinforcement learning, second edition: An introduction. Bradford Books, Cambridge, Massachusetts London, England

work page 2020
[38]

Ferucio Laurentiu Tiplea, Constantin Enea, and Catalin V Bîrjoveanu. 2005. Decidability and complexity results for security protocols.VISSAS1 (2005), 185–211

work page 2005
[39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

work page 2017
[40]

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. InAdvances in Neural Information Processing Systems, Vol. 28. Curran As- sociates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/hash/ 29921001f2f04bd3baee84a12e98098f-Abstract.html

work page 2015
[41]

Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, and Xiaodan Liang. 2023. LEGO-Prover: Neural Theorem Proving with Growing Libraries. arXiv:2310.00656 [cs.AI] https://arxiv.org/abs/2310.00656

work page arXiv 2023
[42]

Wansen Wang, Wenchao Huang, Zhaoyi Meng, Yan Xiong, and Cheng Su. 2024. Advancing the Automation Capability of Verifying Security Protocols .IEEE Transactions on Dependable and Secure Computing21, 06 (Nov. 2024), 5059–5070. doi:10.1109/TDSC.2024.3368131

work page doi:10.1109/tdsc.2024.3368131 2024
[43]

Verifying Security Protocols using Dynamic Strategies

Wang, Wansen and Huang, Wenchao and Meng, Zhaoyi and Xiong, Yan and Su, Cheng. 2019. GitHub repository of “Verifying Security Protocols using Dynamic Strategies”. https://github.com/SmartVerif/SmartVerif

work page 2019
[44]

David J. Wu. 2019. Monte-carlo graph search from first principles. KataGo Documentation, GitHub. https://github.com/lightvector/KataGo/blob/master/ docs/GraphSearch.md

work page 2019
[45]

Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, and Chong Ruan. 2024. DeepSeek- prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search. arXiv:2408.08152 [cs] doi:10.48...

work page doi:10.48550/arxiv.2408.08152 2024
[47]

Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2019. Verifying Security Protocols using Dynamic Strategies. arXiv:1807.00669 [cs.CR] https://arxiv.org/abs/1807.00669v7

work page arXiv 2019
[48]

Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2020. SmartVerif: Push the limit of automation capability of verify- ing security protocols by dynamic strategies. In29th USENIX Security Sym- posium (USENIX Security 20). 253–270. https://www.usenix.org/conference/ usenixsecurity20/presentation/xiong

work page 2020
[49]

Xueliang Zhao, Wenda Li, and Lingpeng Kong. 2023. Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving. arXiv:2305.16366 [cs.CL] https://arxiv.org/abs/2305.16366 13 Cosler, Cremers, Finkbeiner, Ghanem, Medinger A Additional Experimental Data Hyperparameters.Table 3 lists the hyperparameters of the best configuration per ...

work page arXiv 2023

[1] [1]

The Tamarin prover github repository

2026. The Tamarin prover github repository. https://github.com/tamarin- prover/tamarin-prover

work page 2026

[2] [2]

Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox

work page

[3] [3]

InProceedings of the 36th International Conference on Machine Learning

HOList: An environment for machine learning of higher order logic theorem proving. InProceedings of the 36th International Conference on Machine Learning. PMLR, 454–463. https://proceedings.mlr.press/v97/bansal19a.html

work page

[4] [4]

2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide

David Basin, Cas Cremers, Jannik Dreier, and Ralf Sasse. 2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide. Springer Nature

work page 2025

[5] [5]

Bruno Blanchet. 2001. An efficient cryptographic protocol verifier based on Prolog rules. In14th IEEE computer security foundations workshop (CSFW-14). IEEE Computer Society, Cape Breton, Nova Scotia, Canada, 82–96

work page 2001

[6] [6]

Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samoth- rakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games4, 1 (2012), 1–43

work page 2012

[7] [7]

Vincent Cheval, Charlie Jacomme, Steve Kremer, and Robert Künnemann. 2022. {SAPIC+}: protocol verifiers of the world, unite!. In31st USENIX Security Sym- posium (USENIX Security 22). 3935–3952

work page 2022

[8] [8]

Cremers, Cas and Dax, Alexander and Naska, Aurora. 2023. Formal analysis of SPDM: Security protocol and data model version 1.2. InUsenix 2023

work page 2023

[9] [9]

Cremers, Cas and Dax, Alexander and Naska, Aurora. 2025. Breaking and provably restoring authentication: A formal analysis of SPDM 1.2 including cross-protocol attacks. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 1424–1438

work page 2025

[10] [10]

Cremers, Cas and Horvat, Marko and Hoyland, Jonathan and Scott, Sam and van der Merwe, Thyla. 2017. A comprehensive symbolic analysis of TLS 1.3. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (Ccs ’17). Association for Computing Machinery, New York, NY, USA, 1773–1788. doi:10.1145/3133956.3134063 4They are not ac...

work page doi:10.1145/3133956.3134063 2017

[11] [11]

Cremers, Cas and Horvat, Marko and Scott, Sam and van der Merwe, Thyla. 2016. Automated analysis and verification of TLS 1.3: 0-RTT, resumption and delayed authentication. In2016 IEEE symposium on security and privacy (SP). 470–485. doi:10.1109/SP.2016.35

work page doi:10.1109/sp.2016.35 2016

[12] [12]

Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. A formal analysis of IEEE 802.11’s WPA2: countering the kracks caused by cracking the counters. InProceedings of the 29th USENIX conference on security symposium (SEC’20). USENIX Association, USA, Article 1

work page 2020

[13] [13]

Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. Presentation of a formal analysis of IEEE 802.11’s WPA2: Countering the kracks caused by cracking the counters. https://www.usenix.org/conference/usenixsecurity20/ presentation/cremers

work page 2020

[14] [14]

Johannes Czech, Patrick Korus, and Kristian Kersting. 2020. Monte-carlo graph search for AlphaZero. arXiv:2012.11045 [cs] doi:10.48550/arXiv.2012.11045

work page doi:10.48550/arxiv.2012.11045 2020

[15] [15]

Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler

David A. Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler. 2018. Formal analysis of 5G authentication.CoRR abs/1806.10360 (2018). arXiv:1806.10360 http://arxiv.org/abs/1806.10360

work page arXiv 2018

[16] [16]

Basin and Ralf Sasse and Jorge Toro-Pozo

David A. Basin and Ralf Sasse and Jorge Toro-Pozo. 2020. The EMV standard: Break, fix, verify.CoRRabs/2006.08249 (2020). arXiv:2006.08249 https://arxiv. org/abs/2006.08249

work page arXiv 2020

[17] [17]

Danny Dolev and Andrew Yao. 1983. On the security of public key protocols. IEEE Transactions on information theory29, 2 (1983), 198–208

work page 1983

[18] [18]

Jason A Donenfeld and Kevin Milner. 2017. Formal verification of the WireGuard protocol.Technical Report, Tech. Rep.(2017)

work page 2017

[19] [19]

Nancy Durgin, Patrick Lincoln, John Mitchell, and Andre Scedrov. 2004. Multiset rewriting and the complexity of bounded security protocols.Journal of Computer Security12, 2 (2004), 247–311

work page 2004

[20] [20]

Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. 2024. ABEL: Sample efficient online reinforcement learning for neural theorem proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24. https: //openreview.net/forum?id=kk3mSjVCUO

work page 2024

[21] [21]

Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z. Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Ma- soom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Be- loshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salva- tore Mercuri, Calle Sönne, Bhavik Mehta, Alex D...

work page doi:10.1038/s41586-025-09833-y 2026

[22] [22]

Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. 2022. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In The Eleventh International Conference on Learning Representations. https:// openreview.net/forum?id=SMa9EAovKMC

work page 2022

[23] [23]

Nadim Kobeissi, Karthikeyan Bhargavan, and Bruno Blanchet. 2017. Automated verification for secure messaging protocols and their implementations: A sym- bolic and computational approach. In2017 IEEE European symposium on security and privacy (EuroS&P). IEEE, 435–450

work page 2017

[24] [24]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InMachine Learning: ECML 2006, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, Berlin, Heidelberg, 282–293. doi:10.1007/11871842_ 29

work page doi:10.1007/11871842_ 2006

[25] [25]

Robert Künnemann and Graham Steel. 2012. YubiSecure? Formal security analy- sis results for the YubiKey and YubiHSM. InInternational workshop on security and trust management. Springer, 257–272

work page 2012

[26] [26]

Guillaume Lample, Timothee Lacroix, Marie-Anne Lachaux, Aurelien Rodriguez, Amaury Hayat, Thibaut Lavril, Gabriel Ebner, and Xavier Martinet. 2022. HyperTree proof search for neural theorem proving. InAdvances in Neural Information Processing Systems, Vol. 35. 26337– 26349. https://proceedings.neurips.cc/paper_files/paper/2022/hash/ a8901c5e85fb8e1823bbf0...

work page 2022

[27] [27]

Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of Apple’s iMessage PQ3 protocol. In34th USENIX security symposium (USENIX security 25). 5015–5034

work page 2025

[28] [28]

Felix Linker, Christoph Sprenger, Cas Cremers, and David Basin. 2025. Looping for good: Cyclic proofs for security protocols. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 2759–2773

work page 2025

[29] [29]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs] doi:10. 48550/arXiv.1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 2019

[30] [30]

Simon Meier, Benedikt Schmidt, Cas Cremers, and David A. Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InCA V (Lecture notes in computer science, Vol. 8044). Springer, 696–701

work page 2013

[31] [31]

Aleksi Peltonen, Ralf Sasse, and David Basin. 2021. A comprehensive formal analysis of 5G handover. InProceedings of the 14th ACM conference on security and privacy in wireless and mobile networks (WiSec ’21). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3448300.3467823

work page doi:10.1145/3448300.3467823 2021

[32] [32]

Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv:2009.03393 [cs] doi:10.48550/arXiv.2009.03393

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2009.03393 2020

[33] [33]

2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies

Maiwenn Racouchot. 2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies. Ph. D. Dissertation. doi:10.70675/ a4b61c5dz65f8z4921z920fz51e1a9860549

work page 2024

[34] [34]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815 [cs] doi:10.48550/arXiv.1712.01815

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.01815 2017

[35] [35]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science362, 6419 (Dec. 2018), 1140–1144....

work page doi:10.1126/science.aar6404 2018

[36] [36]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of go without hu- man knowledge.Nature550, 7676 (Oct. 2017), 35...

work page doi:10.1038/nature24270 2017

[37] [37]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. 2020.Reinforcement learning, second edition: An introduction. Bradford Books, Cambridge, Massachusetts London, England

work page 2020

[38] [38]

Ferucio Laurentiu Tiplea, Constantin Enea, and Catalin V Bîrjoveanu. 2005. Decidability and complexity results for security protocols.VISSAS1 (2005), 185–211

work page 2005

[39] [39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

work page 2017

[40] [40]

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. InAdvances in Neural Information Processing Systems, Vol. 28. Curran As- sociates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/hash/ 29921001f2f04bd3baee84a12e98098f-Abstract.html

work page 2015

[41] [41]

Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, and Xiaodan Liang. 2023. LEGO-Prover: Neural Theorem Proving with Growing Libraries. arXiv:2310.00656 [cs.AI] https://arxiv.org/abs/2310.00656

work page arXiv 2023

[42] [42]

Wansen Wang, Wenchao Huang, Zhaoyi Meng, Yan Xiong, and Cheng Su. 2024. Advancing the Automation Capability of Verifying Security Protocols .IEEE Transactions on Dependable and Secure Computing21, 06 (Nov. 2024), 5059–5070. doi:10.1109/TDSC.2024.3368131

work page doi:10.1109/tdsc.2024.3368131 2024

[43] [43]

Verifying Security Protocols using Dynamic Strategies

Wang, Wansen and Huang, Wenchao and Meng, Zhaoyi and Xiong, Yan and Su, Cheng. 2019. GitHub repository of “Verifying Security Protocols using Dynamic Strategies”. https://github.com/SmartVerif/SmartVerif

work page 2019

[44] [44]

David J. Wu. 2019. Monte-carlo graph search from first principles. KataGo Documentation, GitHub. https://github.com/lightvector/KataGo/blob/master/ docs/GraphSearch.md

work page 2019

[45] [45]

Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, and Chong Ruan. 2024. DeepSeek- prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search. arXiv:2408.08152 [cs] doi:10.48...

work page doi:10.48550/arxiv.2408.08152 2024

[46] [47]

Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2019. Verifying Security Protocols using Dynamic Strategies. arXiv:1807.00669 [cs.CR] https://arxiv.org/abs/1807.00669v7

work page arXiv 2019

[47] [48]

Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2020. SmartVerif: Push the limit of automation capability of verify- ing security protocols by dynamic strategies. In29th USENIX Security Sym- posium (USENIX Security 20). 253–270. https://www.usenix.org/conference/ usenixsecurity20/presentation/xiong

work page 2020

[48] [49]

Xueliang Zhao, Wenda Li, and Lingpeng Kong. 2023. Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving. arXiv:2305.16366 [cs.CL] https://arxiv.org/abs/2305.16366 13 Cosler, Cremers, Finkbeiner, Ghanem, Medinger A Additional Experimental Data Hyperparameters.Table 3 lists the hyperparameters of the best configuration per ...

work page arXiv 2023