arxiv: 2605.06972 · v1 · submitted 2026-05-07 · 💻 cs.PL · cs.LO

Recognition: no theorem link

A New Interaction Concept for Interactive and Autoactive Program Verification

Wolfram Pfeifer , Mattias Ulbrich , Daniel Drodt

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:55 UTC · model grok-4.3

classification 💻 cs.PL cs.LO

keywords interactive program verificationautoactive program verificationproof state inspectionsource code interactionspecification levelmental gapuser study

0 comments

The pith

Program verification allows direct inspection and interaction with proof states at the source code and specification level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a new interaction concept can let users view and act on the proof state using the original source code and specifications instead of a translated logical form. This matters because program verification is undecidable and usually needs human guidance, yet current tools either hide the proof process or force users to work at a lower abstraction that creates extra cognitive work. By closing that gap, the approach aims to make it easier to understand why verification succeeds or fails and to locate problems in code or annotations. The authors back this with an implementation and a user study showing practical benefits for users.

Core claim

The central claim is that enabling inspection and interaction with the proof state directly on source code and specification level minimizes the mental gap to the underlying logical representation, allowing users to better guide verification, intervene in autoactive attempts, and identify defects.

What carries the argument

The source-level proof state inspector that synchronizes logical proof obligations back to the original program and specification elements for direct user access.

If this is right

Autoactive verification attempts can incorporate user intervention without leaving the source code view.
Users can more readily diagnose the reasons a verification fails by seeing the proof state mapped to familiar code elements.
Defects in either the program or its specification become easier to spot through direct correspondence with the proof state.
The overall effort to complete verification tasks decreases because less mental translation between representations is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same source-level mapping could be applied to other verifiers to test whether the benefit generalizes beyond the described prototype.
Over time this style of interface might reduce the annotation burden indirectly by giving faster, more actionable feedback during verification.
Hybrid tools that switch between autoactive and interactive modes at source level could become a standard pattern for practical verification workflows.

Load-bearing premise

The primary usability barrier in existing verifiers is the mismatch between source-level specifications and low-level proof representations.

What would settle it

A user study in which participants show no measurable improvement in understanding proof states or locating defects when using the source-level interaction compared with traditional low-level interfaces would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.06972 by Daniel Drodt, Mattias Ulbrich, Wolfram Pfeifer.

**Figure 1.** Figure 1: The KeY GUI. On the left is the proof tree, in the center the sequent view, and on the right, the code under verification. from non-trivial properties of non-trivial programs. This combination of interaction patterns enabled verification of complex, real-world Java programs such as TimSort [12], LinkedList [16], IdentityHashMap [10] and ips4o [7]. Nevertheless, manual interactions require a great deal of … view at source ↗

**Figure 2.** Figure 2: Screenshot of the prototype implementation. Section 3.1, the generated JML assumptions and assertions that represent the current proof goal are shown inline in the source view. These insertions for the current goal can be distinguished easily from the user-provided specification by the gray background and the missing line numbers. Note that the formulas exactly correspond to the user input, formula rewrit… view at source ↗

**Figure 3.** Figure 3: Example of a nested heap term for a program with heap assignments and a loop. For assertions, this works vice versa: The initial position is the last line of the method, and they are shifted upwards until they reach a heap update not contained in the formula or the most recently executed statement. Formulas referring to multiple states are inserted into the code as late as possible, referring to other stat… view at source ↗

**Figure 4.** Figure 4: Screenshot of the KeY GUI showing an open goal state of the first task of the user study using the new source code level representation. view closer to the original representation in the user input can be beneficial for understanding the sequent. Since the new interactive view is not exclusive and can be used in combination with the established forms of interaction, users can work with what they find more … view at source ↗

read the original abstract

Fully functional program verification is an undecidable$\unicode{x2014}$and, hence, inherently difficult$\unicode{x2014}$task, that is not automatically solvable but typically requires user interaction and guidance. Existing verifiers either work autoactively, requiring the user to write annotations in source code, without the possibility to inspect the proof state or intervene in case of an unsuccessful attempt, or allow interactions on a logical encoding that is on a lower level than the user-provided specifications. We present a novel interaction concept which allows the user to inspect and interact with the proof state on source code and specification level. This minimizes the mental gap between the representations. We provide an implementation of the concept as a plugin for the Java verification engine KeY, and show with a user study that this prototype can be beneficial for users to understand the proof state and find defects in source code or specifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KeY gets a source-level proof inspector via plugin, but the user study is too thinly described to confirm the claimed gains.

read the letter

The punchline is that this paper offers a KeY plugin letting users inspect and intervene in proofs at the Java source and specification level instead of the logical encoding. They built the thing and ran a user study claiming it helps with understanding the state and spotting defects in code or specs. What is new is the specific integration of source-level interaction into an autoactive verifier. Most existing setups either stay fully automatic with no visibility or drop the user into low-level logic. This tries to close that translation step. The implementation as a plugin is concrete and the decision to include a user study fits the topic. It moves the work past pure description into some evidence of effect. The soft spots sit with the study. The abstract reports benefits but supplies no participant count, task details, conditions, or analysis method. That leaves open whether the gains come from the interaction change or from other differences in how the tasks were run. The core assumption that the representation gap is the main usability barrier also sits lightly supported. Annotation effort or solver performance could easily be larger issues for users, and nothing in the visible material isolates the mechanism. This is for tool builders and researchers working on verification usability. Someone designing interfaces for formal methods tools would pick up the prototype idea and the basic study approach. The work is coherent, cites the right KeY background, and rests on a real implementation plus empirical data. It deserves a serious referee to examine the study design and check whether the results actually back the mechanism claim. I would send it for peer review. Reviewers can request the missing experimental details and push the authors to address whether the gap really dominates other barriers.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a novel interaction concept for program verifiers that permits users to inspect and intervene in the proof state directly at the source-code and specification level (rather than the logical encoding). The authors implement the concept as a plugin for the KeY Java verifier and report a user study indicating that the prototype helps users understand proof states and locate defects in code or specifications.

Significance. If the empirical results are robust, the work could improve the practical usability of interactive and autoactive verifiers by reducing the cognitive distance between user-written annotations and internal proof representations. The concrete implementation and the attempt at user evaluation are positive elements that align with the needs of the program-verification community.

major comments (2)

User study section: the manuscript provides no information on participant count, recruitment, background, task selection, control conditions, or statistical analysis. Because the central claim is that source-level interaction reduces the mental gap and yields measurable benefits, the absence of these details leaves open the possibility that observed gains are due to annotation effort, task simplicity, or other uncontrolled factors rather than the interaction mechanism itself. This is load-bearing for the interpretation of the results.
Introduction and motivation: the paper asserts that the primary usability barrier in existing tools is the mismatch between source-level specifications and low-level proof representations. No comparative baseline (e.g., a version with reduced annotation burden or improved solver feedback) is described that would isolate this factor from annotation writing effort or solver performance, weakening the causal link between the new interaction concept and the reported benefits.

minor comments (1)

Abstract: the term 'autoactively' is used without a brief parenthetical gloss or citation; a short clarification would aid readers outside the immediate subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive criticism of our manuscript. We address each of the major comments below and have made revisions to the manuscript to improve clarity and address the concerns where possible.

read point-by-point responses

Referee: User study section: the manuscript provides no information on participant count, recruitment, background, task selection, control conditions, or statistical analysis. Because the central claim is that source-level interaction reduces the mental gap and yields measurable benefits, the absence of these details leaves open the possibility that observed gains are due to annotation effort, task simplicity, or other uncontrolled factors rather than the interaction mechanism itself. This is load-bearing for the interpretation of the results.

Authors: We agree that the user study section in the submitted manuscript was insufficiently detailed, which weakens the support for our claims. The study was designed as a preliminary evaluation to gather initial feedback on the interaction concept. In the revised manuscript, we have substantially expanded this section to provide the missing information: we now report the participant count, recruitment process, their backgrounds in programming and verification, the specific tasks chosen, the control condition using the standard KeY interface, and the statistical methods applied to the quantitative measures. We have also included a discussion of potential limitations such as sample size and the possibility of confounding factors. These additions should allow readers to better assess the robustness of the results. revision: yes
Referee: Introduction and motivation: the paper asserts that the primary usability barrier in existing tools is the mismatch between source-level specifications and low-level proof representations. No comparative baseline (e.g., a version with reduced annotation burden or improved solver feedback) is described that would isolate this factor from annotation writing effort or solver performance, weakening the causal link between the new interaction concept and the reported benefits.

Authors: The introduction draws on established observations in the program verification literature regarding the cognitive challenges posed by the abstraction gap between user-level annotations and internal proof states. We do not claim this is the only barrier, but it is the one our interaction concept directly targets. Nevertheless, we acknowledge that without a baseline that controls for annotation effort or solver improvements, the causal attribution is not fully isolated. In the revised version, we have modified the introduction to more carefully phrase the motivation, explicitly noting other contributing factors to usability issues, and we have added a dedicated limitations paragraph discussing the study design and the need for future controlled experiments that isolate variables more precisely. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation or claims

full rationale

The paper presents a design for source-level interaction in program verification, describes its KeY plugin implementation, and reports a user study on usability. No equations, fitted parameters, predictions, or first-principles derivations exist that could reduce to inputs by construction. The contribution is a prototype plus empirical evaluation; claims rest on the described system and study results rather than self-referential logic or self-citation chains. No instances of self-definitional, fitted-input, or load-bearing self-citation patterns are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a systems and HCI contribution in program verification; it introduces no free parameters, mathematical axioms, or new postulated entities. The central claim rests on the design of the interaction model and the outcomes of the user study.

pith-pipeline@v0.9.0 · 5448 in / 1013 out tokens · 20004 ms · 2026-05-11T00:55:48.723166+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 23 canonical work pages

[1]

Schmitt, and Mattias Ulbrich (Eds.)

Wolfgang Ahrendt, Bernhard Beckert, Richard Bubel, Reiner Hähnle, Peter H. Schmitt, and Mattias Ulbrich (Eds.). 2016.Deductive Software Verification – the KeY Book: From Theory to Practice. Number 10001 in LNCS. Springer. doi:10.1007/978-3-319-49812-6

work page doi:10.1007/978-3-319-49812-6 2016
[2]

Bernhard Beckert, Richard Bubel, Daniel Drodt, Reiner Hähnle, Florian Lanzinger, Wolfram Pfeifer, Mattias Ulbrich, and Alexander Weigl
[3]

InFormal Methods (LNCS, 14934), Andre Platzer, Kristin Yvonne Rozier, Matteo Pradella, and Matteo Rossi (Eds.)

The Java Verification Tool KeY: A Tutorial. InFormal Methods (LNCS, 14934), Andre Platzer, Kristin Yvonne Rozier, Matteo Pradella, and Matteo Rossi (Eds.). Springer Nature Switzerland, Cham, 597–623. doi:10.1007/978-3-031-71177-0_32

work page doi:10.1007/978-3-031-71177-0_32
[4]

Bernhard Beckert, Sarah Grebing, and Mattias Ulbrich. 2017. An Inter- action Concept for Program Verification Systems with Explicit Proof Object. InHardware and Software: Verification and Testing, Ofer Strich- man and Rachel Tzoref-Brill (Eds.). Springer International Publishing, Cham, 163–178. doi:10.1007/978-3-319-70389-3_11

work page doi:10.1007/978-3-319-70389-3_11 2017
[5]

Bernhard Beckert, Jonas Klamroth, Wolfram Pfeifer, Patrick Röper, and Samuel Teuber. 2024. Towards Combining the Cognitive Abilities of Large Language Models with the Rigor of Deductive Progam Veri- fication. In12th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2024). Software Engineering Methodol...

work page doi:10.1007/978-3-031-75387-9_15 2024
[6]

Bernhard Beckert and Vladimir Klebanov. 2004. Proof Reuse for Deductive Program Verification. InProceedings of the Second Inter- national Conference on Software Engineering and Formal Methods, Jorge Cuellar and Zhiming Liu (Eds.). IEEE Computer Society, 77–

2004
[7]

doi:10.1109/SEFM.2004.10013

work page doi:10.1109/sefm.2004.10013 2004
[8]

Bernhard Beckert, Vladimir Klebanov, and Mattias Ulbrich. 2015. Re- gression Verification for Java Using a Secure Information Flow Calcu- lus. InProc. 17th FTfJP (FTfJP ’15), Rosemary Monahan (Ed.). ACM, Prague, Czech Republic, 6:1–6:6. doi:10.1145/2786536.2786544

work page doi:10.1145/2786536.2786544 2015
[9]

Bernhard Beckert, Peter Sanders, Mattias Ulbrich, Julian Wiesler, and Sascha Witt. 2024. Formally Verifying an Efficient Sorter. InTACAS Proc., Part I (LNCS, 14570), Bernd Finkbeiner and Laura Kovács (Eds.). Springer, 268–287. doi:10.1007/978-3-031-57246-3_15

work page doi:10.1007/978-3-031-57246-3_15 2024
[10]

ter Beek, Rod Chapman, Rance Cleaveland, Hubert Garavel, Rong Gu, Ivo ter Horst, Jeroen J

Maurice H. ter Beek, Rod Chapman, Rance Cleaveland, Hubert Garavel, Rong Gu, Ivo ter Horst, Jeroen J. A. Keiren, Thierry Lecomte, Michael Leuschel, Kristin Yvonne Rozier, Augusto Sampaio, Cristina Seceleanu, Martyn Thomas, Tim A. C. Willemse, and Lijun Zhang. 2024. Formal Methods in Industry.Form. Asp. Comput.37, 1 (Dec. 2024), 7:1–7:38. doi:10.1145/36893...

work page doi:10.1145/3689374 2024
[11]

Stefan Blom and Marieke Huisman. 2014. The VerCors Tool for Verifi- cation of Concurrent Programs. InFM 2014: Formal Methods, 19th Intl. Symposium, Singapore (LNCS, 8442), Cliff B. Jones, Pekka Pihlajasaari, and Jun Sun (Eds.). Springer, 127–131. doi:10.1007/978-3-319-06410- 9_9

work page doi:10.1007/978-3-319-06410- 2014
[12]

Martin de Boer, Stijn de Gouw, Jonas Klamroth, Christian Jung, Mat- tias Ulbrich, and Alexander Weigl. 2023. Formal Specification and Verification of JDK’s Identity Hash Map Implementation.Formal Aspects Comput.35, 3 (Sept. 2023), 18:1–18:26. doi:10.1145/3594729

work page doi:10.1145/3594729 2023
[13]

David R. Cok. 2011. OpenJML: JML for Java 7 by Extending Open- JDK. InNASA Formal Methods (LNCS, 6617), Mihaela Bobaru, Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi (Eds.). Springer, Berlin, Heidelberg, 472–479. doi:10.1007/978-3-642-20398-5_35

work page doi:10.1007/978-3-642-20398-5_35 2011
[14]

De Boer, Richard Bubel, and Reiner Hähnle

Stijn De Gouw, Jurriaan Rot, Frank S. De Boer, Richard Bubel, and Reiner Hähnle. 2015. OpenJDK’s java.utils.Collection.sort() Is Broken: The Good, the Bad and the Worst Case. InProc. 27th Intl. Conf. on Computer Aided Verification (CA V), San Francisco (LNCS, 9206), Daniel Kroening and Corina Pasareanu (Eds.). Springer, 273–289. doi:10.1007/ 978-3-319-21690-4_16

2015
[15]

Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. 2022. Creusot: A Foundry for the Deductive Verification of Rust Programs. InICFEM 2022, Proceedings (LNCS, 13478), Adrián Riesco and Min Zhang (Eds.). Springer, 90–105. doi:10.1007/978-3-031-17244-1_6

work page doi:10.1007/978-3-031-17244-1_6 2022
[16]

Jean-Christophe Filliâtre and Andrei Paskevich. 2013. Why3 — Where Programs Meet Provers. InProceedings of the 22nd European Sympo- sium on Programming (LNCS, 7792), Matthias Felleisen and Philippa Gardner (Eds.). Springer, 125–128. doi:10.1007/978-3-642-37036-6_8

work page doi:10.1007/978-3-642-37036-6_8 2013
[18]

Hiep, Olaf Maathuis, Jinting Bian, Frank S

Hans-Dieter A. Hiep, Olaf Maathuis, Jinting Bian, Frank S. de Boer, Marko van Eekelen, and Stijn de Gouw. 2020. Verifying OpenJDK’s LinkedList Using KeY. InTACAS (LNCS, 12079), Armin Biere and David Parker (Eds.). Springer, Cham, 217–234. doi:10.1007/978-3-030-45237- 7_13

work page doi:10.1007/978-3-030-45237- 2020
[19]

2008.The VeriFast Program Veri- fier

Bart Jacobs and Frank Piessens. 2008.The VeriFast Program Veri- fier. Technical Report CW-520. Department of Computer Science, Katholieke Universiteit Leuven.http://www.cs.kuleuven.be/~bartj/ verifast/verifast.pdf

2008
[20]

Leavens, Albert L

Gary T. Leavens, Albert L. Baker, and Clyde Ruby. 1999. JML: A Notation for Detailed Design. InBehavioral Specifications of Businesses and Systems, Haim Kilov, Bernhard Rumpe, and Ian Simmonds (Eds.). Number 523 in The Kluwer International Series in Engineering and Computer Science. Springer, 175–188. doi:10.1007/978-1-4615-5229- 1_12

work page doi:10.1007/978-1-4615-5229- 1999
[21]

Leavens, David R

Gary T. Leavens, David R. Cok, and Amirfarhad Nilizadeh. 2022. Further Lessons from the JML Project. InThe Logic of Software. A Tasting Menu of Formal Methods: Essays Dedicated to Reiner Hähnle on the Occasion of His 60th Birthday, Wolfgang Ahrendt, Bernhard Beckert, Richard Bubel, and Einar Broch Johnsen (Eds.). Number 13360 in LNCS. Springer Internation...

work page doi:10.1007/978-3-031-08166-8_15 2022
[22]

Rustan M

K. Rustan M. Leino. 2010. Dafny: An Automatic Program Verifier for Functional Correctness. InLPAR-16, Revised Selected Papers (LNCS, 6355), Edmund M. Clarke and Andrei Voronkov (Eds.). Springer, 348–

2010
[23]

doi:10.1007/978-3-642-17511-4_20

work page doi:10.1007/978-3-642-17511-4_20
[24]

John McCarthy. 1963. A Basis for a Mathematical Theory of Compu- tation. InComputer Programming and Formal Systems, P. Braffort and D. Hirschberg (Eds.). North Holland, 33–69

1963
[25]

Design by Contract

Bertrand Meyer. 1992. Applying "Design by Contract".Computer25, 10 (1992), 40–51. doi:10.1109/2.161279

work page doi:10.1109/2.161279 1992
[26]

Wytse Oortwijn, Marieke Huisman, Sebastiaan J. C. Joosten, and Jaco van de Pol. 2020. Automated Verification of Parallel Nested DFS. In TACAS, Armin Biere and David Parker (Eds.). Springer International Publishing, Cham, 247–265. doi:10.1007/978-3-030-45190-5_14

work page doi:10.1007/978-3-030-45190-5_14 2020
[27]

João Pereira, Tobias Klenze, Sofia Giampietro, Markus Limbeck, Diony- sios Spiliopoulos, Felix Wolf, Marco Eilers, Christoph Sprenger, David Basin, Peter Müller, and Adrian Perrig. 2025. Protocols to Code: For- mal Verification of a Secure Next-Generation Internet Router. InProc. of the 2025 ACM SIGSAC Conference on Computer and Communications Security (C...

work page doi:10.1145/3719027.3765104 2025
[28]

Philipp Rümmer and Mattias Ulbrich. 2016. Proof Search with Taclets. See [1], 107–147. doi:10.1007/978-3-319-49812-6_4

work page doi:10.1007/978-3-319-49812-6_4 2016
[29]

Mohsen Safari, Wytse Oortwijn, and Marieke Huisman. 2021. Auto- mated Verification of the Parallel Bellman–Ford Algorithm. InStatic Analysis, Cezara Drăgoi, Suvam Mukherjee, and Kedar Namjoshi (Eds.). Springer International Publishing, Cham, 346–358. doi:10.1007/978-3- 030-88806-0_17

work page doi:10.1007/978-3- 2021
[30]

2023.Verification of Red-Black Trees in KeY – A Case Study in Deductive Java Verification

Johanna Stuber. 2023.Verification of Red-Black Trees in KeY – A Case Study in Deductive Java Verification. Bachelor Thesis. Karlsruhe Insti- tute of Technology (KIT). doi:10.5445/IR/1000162878 Preprint

work page doi:10.5445/ir/1000162878 2023