Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin
Pith reviewed 2026-05-25 04:07 UTC · model grok-4.3
The pith
Reinforcement learning finds more automatic proofs in Tamarin and produces shorter proof trees than prior methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that a neural heuristic, trained via reinforcement learning on subproofs from Tamarin derivations, can direct Monte Carlo Tree Search to explore the proof space more effectively than Tamarin's built-in strategies, resulting in both higher success rates on difficult protocols and more compact proof traces.
What carries the argument
The stateless Tamarin API that exposes the prover as a classical RL environment, allowing Monte Carlo Tree Search to be guided by a neural value network learned from completed subproofs.
If this is right
- More protocols can be verified automatically with reduced expert intervention.
- Shorter proof trees make results easier to inspect and maintain.
- The standardized interface enables other automated tools to interact programmatically with Tamarin.
- Similar RL-guided search can be applied out of the box to assist ongoing protocol verification work.
Where Pith is reading between the lines
- Expanding the training set beyond the original 16 protocols could improve performance on even larger state spaces.
- Shorter automated proofs may surface minimal attack traces more directly than longer manually guided ones.
- The approach could support hybrid workflows that combine RL search with selective human input for the hardest cases.
Load-bearing premise
The neural heuristic trained on completed subproofs from the 16 case studies will generalize to new or more complex protocols without requiring substantial additional human guidance or retraining.
What would settle it
Running the trained system on a previously unseen protocol model from a recent publication and observing that it finds no additional proofs or requires as much human intervention as the standard Tamarin search.
Figures
read the original abstract
Tools like Tamarin and ProVerif have achieved notable success in analyzing and verifying complex real-world protocols such as EMV, 5G, and WPA2, even detecting zero-day exploits. Despite these successes, verifying such protocols remains a time-consuming, challenging task, often requiring significant human effort and expertise. In this paper, we present a reinforcement learning (RL) framework inspired by AlphaZero and AlphaProof that implements a new style of proof search for Tamarin. We have developed a stateless API for Tamarin that acts as a classical RL environment. We guide a Monte Carlo Tree Search (MCTS) by a neural heuristic that learns from completed subproofs. We evaluate our framework on 16 case studies, ranging from classical protocol models to challenging state-of-the-art protocol models from recent publications. Our method finds more proofs automatically than Tamarin's standard search and produces shorter proofs than both the standard and human-engineered heuristics. Our pipeline is applicable out of the box to assist Tamarin users in active research, reducing the human effort required. Moreover, our standardized interface provides a programmatic way for users to interact with Tamarin. Finally, our work demonstrates the promising potential of adapting RL-based methods to the Tamarin domain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a reinforcement learning framework for Tamarin that uses Monte Carlo Tree Search guided by a neural heuristic trained on completed subproofs. It develops a stateless API turning Tamarin into an RL environment and evaluates the approach on 16 case studies (classical to state-of-the-art protocols), claiming higher automatic proof success rates and shorter proofs than Tamarin's standard search and human-engineered heuristics, with the goal of reducing human effort in protocol verification.
Significance. If the performance claims hold under proper controls, the work would be significant for the security protocol analysis community: it provides a standardized programmatic interface to Tamarin and demonstrates that RL-based search can outperform existing heuristics on real protocols such as those in recent publications. The reusable API and out-of-the-box applicability are concrete strengths that could be adopted by Tamarin users.
major comments (3)
- [Evaluation] Evaluation section (16 case studies): the abstract and evaluation claim superior proof success and shorter proofs, yet supply no information on the precise metric for proof length (e.g., number of steps, nodes visited, or bytes), how proofs are counted as 'found', the computational budget and time limits given to all baselines, or any statistical controls such as multiple runs with variance. These omissions make the quantitative claims impossible to assess or reproduce.
- [Method and Evaluation] Training procedure and evaluation: the neural heuristic is trained on completed subproofs drawn from the identical 16 case studies used for testing. No held-out protocols, cross-validation, or separation between training and test instances is described. This setup risks protocol-specific overfitting and prevents isolating whether gains come from the RL component or from fitting to the state spaces and successful paths of these particular models.
- [Evaluation] Comparison to human-engineered heuristics: the claim that the RL method produces shorter proofs than human-engineered heuristics is load-bearing for the 'less effort' contribution, but the manuscript does not specify which human heuristics were used, how they were configured, or whether they received equivalent search effort.
minor comments (2)
- [API] The description of the stateless API would benefit from a small example interaction sequence or pseudocode to clarify how Tamarin states are exposed as RL observations and actions.
- [Figures] Figure captions and axis labels in the results plots should explicitly state the units and aggregation method (mean/median over what?).
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important gaps in the clarity of our evaluation methodology. We address each point below and will revise the manuscript to incorporate additional details on metrics, budgets, and heuristics while clarifying the training setup. These changes will strengthen the reproducibility of our results without altering the core claims.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section (16 case studies): the abstract and evaluation claim superior proof success and shorter proofs, yet supply no information on the precise metric for proof length (e.g., number of steps, nodes visited, or bytes), how proofs are counted as 'found', the computational budget and time limits given to all baselines, or any statistical controls such as multiple runs with variance. These omissions make the quantitative claims impossible to assess or reproduce.
Authors: We agree that these details were insufficiently specified. Proof length is measured by the number of nodes in the generated proof tree. A proof is counted as found when the search reaches a complete, valid proof state within the time budget. All methods (standard Tamarin search, human heuristics, and our RL-MCTS) were allocated an identical per-protocol time limit of 300 seconds on the same hardware. We will add an explicit subsection describing these metrics, the uniform budget, and results averaged over five independent runs with standard deviation to enable reproduction and statistical assessment. revision: yes
-
Referee: [Method and Evaluation] Training procedure and evaluation: the neural heuristic is trained on completed subproofs drawn from the identical 16 case studies used for testing. No held-out protocols, cross-validation, or separation between training and test instances is described. This setup risks protocol-specific overfitting and prevents isolating whether gains come from the RL component or from fitting to the state spaces and successful paths of these particular models.
Authors: The concern is valid: the heuristic was trained on subproofs generated from the same 16 models used in evaluation. This design choice was made because the subproofs provide diverse training signals across protocol classes, and the method is intended to assist users on models they are actively verifying. However, it does not constitute a strict held-out evaluation. We will revise the text to explicitly acknowledge this limitation, add a discussion of potential overfitting risks, and include a note that future work should evaluate cross-protocol generalization. If additional held-out models become available before resubmission, we will report preliminary results on them. revision: partial
-
Referee: [Evaluation] Comparison to human-engineered heuristics: the claim that the RL method produces shorter proofs than human-engineered heuristics is load-bearing for the 'less effort' contribution, but the manuscript does not specify which human heuristics were used, how they were configured, or whether they received equivalent search effort.
Authors: We will clarify this in the revision. The human-engineered baselines are Tamarin's built-in 'smart' heuristic and the 'oracle' heuristic (as documented in the Tamarin manual and prior protocol papers). Both were invoked with their default parameters and given the identical 300-second time budget used for all other methods. We will add a table or paragraph listing the exact Tamarin command-line flags and configurations employed. revision: yes
Circularity Check
No significant circularity detected.
full rationale
The paper describes an empirical RL framework for Tamarin proof search, with a neural heuristic trained on completed subproofs and evaluated on 16 case studies. No mathematical derivation chain, equations, or first-principles results are present. No self-citations, ansatzes, or fitted parameters are shown reducing the performance claims (more proofs found, shorter proofs) to inputs by construction. The evaluation is presented as direct experimental comparison to baselines, making the result self-contained without circular reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Tamarin prover github repository
2026. The Tamarin prover github repository. https://github.com/tamarin- prover/tamarin-prover
work page 2026
-
[2]
Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox
-
[3]
InProceedings of the 36th International Conference on Machine Learning
HOList: An environment for machine learning of higher order logic theorem proving. InProceedings of the 36th International Conference on Machine Learning. PMLR, 454–463. https://proceedings.mlr.press/v97/bansal19a.html
-
[4]
2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide
David Basin, Cas Cremers, Jannik Dreier, and Ralf Sasse. 2025.Modeling and analyzing security protocols with Tamarin: a comprehensive guide. Springer Nature
work page 2025
-
[5]
Bruno Blanchet. 2001. An efficient cryptographic protocol verifier based on Prolog rules. In14th IEEE computer security foundations workshop (CSFW-14). IEEE Computer Society, Cape Breton, Nova Scotia, Canada, 82–96
work page 2001
-
[6]
Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samoth- rakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games4, 1 (2012), 1–43
work page 2012
-
[7]
Vincent Cheval, Charlie Jacomme, Steve Kremer, and Robert Künnemann. 2022. {SAPIC+}: protocol verifiers of the world, unite!. In31st USENIX Security Sym- posium (USENIX Security 22). 3935–3952
work page 2022
-
[8]
Cremers, Cas and Dax, Alexander and Naska, Aurora. 2023. Formal analysis of SPDM: Security protocol and data model version 1.2. InUsenix 2023
work page 2023
-
[9]
Cremers, Cas and Dax, Alexander and Naska, Aurora. 2025. Breaking and provably restoring authentication: A formal analysis of SPDM 1.2 including cross-protocol attacks. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 1424–1438
work page 2025
-
[10]
Cremers, Cas and Horvat, Marko and Hoyland, Jonathan and Scott, Sam and van der Merwe, Thyla. 2017. A comprehensive symbolic analysis of TLS 1.3. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (Ccs ’17). Association for Computing Machinery, New York, NY, USA, 1773–1788. doi:10.1145/3133956.3134063 4They are not ac...
-
[11]
Cremers, Cas and Horvat, Marko and Scott, Sam and van der Merwe, Thyla. 2016. Automated analysis and verification of TLS 1.3: 0-RTT, resumption and delayed authentication. In2016 IEEE symposium on security and privacy (SP). 470–485. doi:10.1109/SP.2016.35
-
[12]
Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. A formal analysis of IEEE 802.11’s WPA2: countering the kracks caused by cracking the counters. InProceedings of the 29th USENIX conference on security symposium (SEC’20). USENIX Association, USA, Article 1
work page 2020
-
[13]
Cremers, Cas and Kiesl, Benjamin and Medinger, Niklas. 2020. Presentation of a formal analysis of IEEE 802.11’s WPA2: Countering the kracks caused by cracking the counters. https://www.usenix.org/conference/usenixsecurity20/ presentation/cremers
work page 2020
-
[14]
Johannes Czech, Patrick Korus, and Kristian Kersting. 2020. Monte-carlo graph search for AlphaZero. arXiv:2012.11045 [cs] doi:10.48550/arXiv.2012.11045
-
[15]
Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler
David A. Basin and Jannik Dreier and Lucca Hirschi and Sasa Radomirovic and Ralf Sasse and Vincent Stettler. 2018. Formal analysis of 5G authentication.CoRR abs/1806.10360 (2018). arXiv:1806.10360 http://arxiv.org/abs/1806.10360
-
[16]
Basin and Ralf Sasse and Jorge Toro-Pozo
David A. Basin and Ralf Sasse and Jorge Toro-Pozo. 2020. The EMV standard: Break, fix, verify.CoRRabs/2006.08249 (2020). arXiv:2006.08249 https://arxiv. org/abs/2006.08249
-
[17]
Danny Dolev and Andrew Yao. 1983. On the security of public key protocols. IEEE Transactions on information theory29, 2 (1983), 198–208
work page 1983
-
[18]
Jason A Donenfeld and Kevin Milner. 2017. Formal verification of the WireGuard protocol.Technical Report, Tech. Rep.(2017)
work page 2017
-
[19]
Nancy Durgin, Patrick Lincoln, John Mitchell, and Andre Scedrov. 2004. Multiset rewriting and the complexity of bounded security protocols.Journal of Computer Security12, 2 (2004), 247–311
work page 2004
-
[20]
Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. 2024. ABEL: Sample efficient online reinforcement learning for neural theorem proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24. https: //openreview.net/forum?id=kk3mSjVCUO
work page 2024
-
[21]
Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z. Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Ma- soom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Be- loshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salva- tore Mercuri, Calle Sönne, Bhavik Mehta, Alex D...
-
[22]
Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. 2022. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In The Eleventh International Conference on Learning Representations. https:// openreview.net/forum?id=SMa9EAovKMC
work page 2022
-
[23]
Nadim Kobeissi, Karthikeyan Bhargavan, and Bruno Blanchet. 2017. Automated verification for secure messaging protocols and their implementations: A sym- bolic and computational approach. In2017 IEEE European symposium on security and privacy (EuroS&P). IEEE, 435–450
work page 2017
-
[24]
Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InMachine Learning: ECML 2006, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, Berlin, Heidelberg, 282–293. doi:10.1007/11871842_ 29
-
[25]
Robert Künnemann and Graham Steel. 2012. YubiSecure? Formal security analy- sis results for the YubiKey and YubiHSM. InInternational workshop on security and trust management. Springer, 257–272
work page 2012
-
[26]
Guillaume Lample, Timothee Lacroix, Marie-Anne Lachaux, Aurelien Rodriguez, Amaury Hayat, Thibaut Lavril, Gabriel Ebner, and Xavier Martinet. 2022. HyperTree proof search for neural theorem proving. InAdvances in Neural Information Processing Systems, Vol. 35. 26337– 26349. https://proceedings.neurips.cc/paper_files/paper/2022/hash/ a8901c5e85fb8e1823bbf0...
work page 2022
-
[27]
Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of Apple’s iMessage PQ3 protocol. In34th USENIX security symposium (USENIX security 25). 5015–5034
work page 2025
-
[28]
Felix Linker, Christoph Sprenger, Cas Cremers, and David Basin. 2025. Looping for good: Cyclic proofs for security protocols. InProceedings of the 2025 ACM SIGSAC conference on computer and communications security. 2759–2773
work page 2025
-
[29]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs] doi:10. 48550/arXiv.1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[30]
Simon Meier, Benedikt Schmidt, Cas Cremers, and David A. Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InCA V (Lecture notes in computer science, Vol. 8044). Springer, 696–701
work page 2013
-
[31]
Aleksi Peltonen, Ralf Sasse, and David Basin. 2021. A comprehensive formal analysis of 5G handover. InProceedings of the 14th ACM conference on security and privacy in wireless and mobile networks (WiSec ’21). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3448300.3467823
-
[32]
Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv:2009.03393 [cs] doi:10.48550/arXiv.2009.03393
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2009.03393 2020
-
[33]
2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies
Maiwenn Racouchot. 2024.Formal analysis of security protocols : real-world case-studies and automated proof strategies. Ph. D. Dissertation. doi:10.70675/ a4b61c5dz65f8z4921z920fz51e1a9860549
work page 2024
-
[34]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815 [cs] doi:10.48550/arXiv.1712.01815
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.01815 2017
-
[35]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Grae- pel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science362, 6419 (Dec. 2018), 1140–1144....
-
[36]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of go without hu- man knowledge.Nature550, 7676 (Oct. 2017), 35...
-
[37]
Richard S. Sutton and Andrew G. Barto. 2020.Reinforcement learning, second edition: An introduction. Bradford Books, Cambridge, Massachusetts London, England
work page 2020
-
[38]
Ferucio Laurentiu Tiplea, Constantin Enea, and Catalin V Bîrjoveanu. 2005. Decidability and complexity results for security protocols.VISSAS1 (2005), 185–211
work page 2005
-
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
work page 2017
-
[40]
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. InAdvances in Neural Information Processing Systems, Vol. 28. Curran As- sociates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/hash/ 29921001f2f04bd3baee84a12e98098f-Abstract.html
work page 2015
-
[41]
Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, and Xiaodan Liang. 2023. LEGO-Prover: Neural Theorem Proving with Growing Libraries. arXiv:2310.00656 [cs.AI] https://arxiv.org/abs/2310.00656
-
[42]
Wansen Wang, Wenchao Huang, Zhaoyi Meng, Yan Xiong, and Cheng Su. 2024. Advancing the Automation Capability of Verifying Security Protocols .IEEE Transactions on Dependable and Secure Computing21, 06 (Nov. 2024), 5059–5070. doi:10.1109/TDSC.2024.3368131
-
[43]
Verifying Security Protocols using Dynamic Strategies
Wang, Wansen and Huang, Wenchao and Meng, Zhaoyi and Xiong, Yan and Su, Cheng. 2019. GitHub repository of “Verifying Security Protocols using Dynamic Strategies”. https://github.com/SmartVerif/SmartVerif
work page 2019
-
[44]
David J. Wu. 2019. Monte-carlo graph search from first principles. KataGo Documentation, GitHub. https://github.com/lightvector/KataGo/blob/master/ docs/GraphSearch.md
work page 2019
-
[45]
Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, and Chong Ruan. 2024. DeepSeek- prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search. arXiv:2408.08152 [cs] doi:10.48...
- [47]
-
[48]
Yan Xiong, Cheng Su, Wenchao Huang, Fuyou Miao, Wansen Wang, and Hengyi Ouyang. 2020. SmartVerif: Push the limit of automation capability of verify- ing security protocols by dynamic strategies. In29th USENIX Security Sym- posium (USENIX Security 20). 253–270. https://www.usenix.org/conference/ usenixsecurity20/presentation/xiong
work page 2020
-
[49]
Xueliang Zhao, Wenda Li, and Lingpeng Kong. 2023. Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving. arXiv:2305.16366 [cs.CL] https://arxiv.org/abs/2305.16366 13 Cosler, Cremers, Finkbeiner, Ghanem, Medinger A Additional Experimental Data Hyperparameters.Table 3 lists the hyperparameters of the best configuration per ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.