cs.PL — Pith

0

cs.PL 2026-05-13 Recognition

Categorical triple formalizes LLM agent harness design

Harness Engineering as Categorical Architecture

The (G, Know, Phi) structure maps memory, skills, and protocols to components whose guarantees are preserved by compiler identity and replay

abstract click to expand

The agent harness, the system layer comprising prompts, tools, memory, and orchestration logic that surrounds the model, has emerged as the central engineering abstraction for LLMbased agents. Yet harness design remains ad hoc, with no formal theory governing composition, preservation of properties under compilation, or systematic comparison across frameworks. We show that the categorical Architecture triple (G, Know, Phi) from the ArchAgents framework provides exactly this formalization. The four pillars of agent externalization (Memory, Skills, Protocols, Harness Engineering) map onto the triple's components: Memory as coalgebraic state, Skills as operad-composed objects, Protocols as syntactic wiring G, and the full Harness as the Architecture itself. Structural guarantees-integrity gates, quality-based escalation, supported convergence checks-are Know-level certificates whose preservation is structural replay: our compiler checks identity and verifier replay, not output-layer correctness or model behavior. We validate this correspondence with a reference implementation featuring compiler functors targeting Swarms, DeerFlow, Ralph, Scion, and LangGraph: the four configuration compilers preserve three named certificate types by identity or replay, and LangGraph preserves the same certificates through its shared per-stage execution path. The LangGraph compiler creates one node per stage using the same per-stage method as the native runtime, providing LangGraph-native observability without reimplementing harness logic. An end-to-end escalation experiment with real LLM agents confirms that the quality-based escalation control path is model-parametric in this two-model, one-task experiment. The result positions categorical architecture as the formal theory behind harness engineering.

0

cs.PL 2026-05-13 2 theorems

Generic type system automates heap cost analysis

Automated Amortised Analysis of Skew Heaps and Leftist Heaps (Extended Version)

Enhanced inference supports all known potential functions and confirms established bounds for skew and leftist heaps.

abstract click to expand

We study the fully automated amortised analysis of purely functional data structures like skew heaps, as well as weight- and rank-biased leftist heaps. For that we generalise earlier works on automated amortised resource analysis by developing a type inference based approach with a generic type system. This allows for modular reasoning and the inference of precise and optimal cost bounds. More specifically, we extend the work on the ATLAS system by Leutgeb et al. which was developed to cover the analysis of splay trees and some closely related data structures. To enable the analysis of skew heaps, however, and the even more challenging (amortised) analysis of leftist heaps, we have developed a range of new techniques for type-based automated analysis. By introducing a generic type system we allow for arbitrary (classes of) potential functions, compared to the use of hard-coded potential functions in ATLAS, which we have implemented in Haskell in an entirely modular way. We have also greatly enhanced the existing type inference algorithm by extensions in multiple directions, including path-sensitive reasoning, data structure invariants, and template parameters for piecewise defined potential functions. We show how our newly developed system supports the use of all known potential functions for analysing skew heaps and leftist heaps, confirming the known bounds.

0

cs.PL 2026-05-12 Recognition

Bounded-preemption SC verification easy only for single writers

Verifying Sequential Consistency under Bounded Preemptions

The problem is polynomial-time solvable for single-writer code but NP-hard for two writers and ETH-hard for three.

abstract click to expand

Gibbons and Korach studied a fundamental problem in 1997: given an observed sequence of reads and writes of a multi-threaded program, does there exist an interleaving which is sequentially consistent? Apart from applications in testing shared memory implementations, a procedure for this problem is employed in Dynamic Partial-Order-Reduction (DPOR) algorithms. The problem is known to be NP-hard even when different syntactic parameters are kept bounded. In this paper, we consider a restriction on the kind of interleaving required: does there exist a sequentially-consistent interleaving with at most {\pi} preemptions? Empirical evidence suggests that several bugs manifest within a few preemptive switches. This motivates us to investigate the problem under bounded preemptions. Our results exhibit a trichotomy: the problem lends to a polynomial-time algorithm for the class of single-writer programs where for each variable, there is a single thread writing to it; it becomes NP-hard for two-writer programs and finally, for three-writer programs, we get a conditional lower bound under the Exponential-Time-Hypothesis. When the number of preemptions {\pi} is not bounded, we show the problem to be W[1]-hard, and hence unlikely to be fixed-parameter-tractable with parameter {\pi}.

0

cs.PL 2026-05-12 2 theorems

Move Prover verifies first-class imperative functions

Formal Verification of Imperative First-Class Functions in Move

Behavioral predicates and state labels let SMT check dynamic dispatch in Move without separation logic.

abstract click to expand

The Move Prover (MVP) is a formal verifier for smart contracts written in the Move programming language. Recently, Move on Aptos was extended with higher-order functions: imperative functions as first-class values that can be passed around, stored in data structs, and kept in persistent storage, enabling dynamic dispatch. This paper describes the representation of function values in the Move specification language and their implementation in MVP. We introduce behavioral predicates which characterize Move functions (aborts and pre/post conditions) by single-state or two-state predicates. We also introduce state labels for naming intermediate memory states in which expressions are evaluated and which allow to compose behavioral predicates to describe sequences of state transitions. On SMT level, function values are encoded by discriminating over the possible function values reaching a call site: when the concrete function is known, its effect is accounted for directly; when it is unknown (for example, a function parameter, or a closure loaded from storage), its behavioral predicates describe the effect. Our approach goes beyond, for example, Dafny, by supporting imperative first-class functions which can modify state via Rust-style references and global variables, and leads to more efficient SMT encodings than separation logic because of the static separation of memory enabled by Move. We further extend MVP's specification inference tool to work with function values: given arbitrary higher-order Move code, weakest-precondition analysis semi-automatically derives behavioral-predicate-based specifications, reducing the annotation burden and providing a validation pipeline for the new specification constructs.

0

cs.PL 2026-05-12 Recognition

Tool pairs weakest-precondition analysis with AI to infer Move specs

Combining Mechanical and Agentic Specification Inference for Move

The prover validates candidates for loop invariants and high-level properties until verification succeeds on code with loops and dynamic dis

abstract click to expand

In this paper, we describe early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI such as Claude Code. Specification inference reduces the boilerplate of writing specifications in Move: in order to verify a high-level property such as a global state invariant, pre- and post-conditions for the supporting functions typically have to be written by hand, which is tedious. In our setting, a Model Context Protocol (MCP) service exposes the WP analysis and the prover itself to the coding agent. The WP analysis provides a sound, mechanical baseline for inference; the AI is used precisely where WP is weakest -- for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as the oracle that decides whether the generated specs are valid, and the agent is equipped to generate proof hints and to refine the inferred specification until verification succeeds. The tool has been applied to a corpus of canonical Move code, including code that uses higher-order functions, dynamic dispatch, global state, references, and various forms of loops.

0

cs.PL 2026-05-11 Recognition

CaMPL type system blocks deadlocks in concurrent code

Categorical Message Passing Language (CaMPL) for programmers

Category-theory derived types keep non-recursive message-passing programs free of deadlock and livelock while allowing races and custom ch

abstract click to expand

Categorical Message Passing Language (CaMPL) is a functional-style concurrent programming language whose semantics is in category theory, more specifically, linear actegories. Its core programming feature is message passing along typed communication channels between concurrent processes. CaMPL also supports controlled non-determinism via 'races' which allow processes to adapt dynamically while they are running, higher-order processes which pass other processes as messages, and custom channel datatypes called protocols and coprotocols which allow one to define infinite channel types or implement session types. The type system of CaMPL arises from a Curry-Howard-Lambek-like correspondence for concurrent programming, established by Cockett and Pastro in their paper titled "The logic of message passing". This type system ensures that a formal CaMPL program, i.e., one which does not allow general recursion, will never become deadlocked or livelocked. In this article, we explore the type system of CaMPL, custom channel types, and controlled non-determinism using code examples after briefly introducing its mathematical underpinnings.

0

cs.PL 2026-05-11 2 theorems

Credits on thunks make amortised analysis sound for persistent structures

Persistent Amortised Analysis, Operationally

An operational semantics reveals that restricting credits to thunks suffices for correct bounds, clarifying why the debit method works.

abstract click to expand

Amortised analysis is a technique for proving a combined time bound for a batch of operations on a data structure, even if some of those operations are expensive. But the traditional method of amortised analysis yields incorrect time bounds when the data structure is used persistently. Persistence allows operations to be performed on previous versions of the data structure, which prevents us from amortising expensive restructuring work. In his seminal book, Chris Okasaki showed how to extend amortised analysis to persistent usage. His method works by extending the data structure with thunks and performing the analysis with debits rather than credits. His argument, that credits are unsound for analysing persistent usage, has become folklore. In this paper, we provide a new perspective on the role of debits in Okasaki's work. First, we set up an operational semantics of call-by-value lambda calculus with thunks, and show formally that traditional amortised analysis does not work in a persistent setting. Then we show that, contrary to the folklore, credit-based amortised analysis can be sound in a persistent setting as long as credits are only stored on thunks. Finally, we provide a formal semantics for Okasaki's debit-based approach. Our paper clarifies the formal foundation of Okasaki's work and makes it accessible to a wider audience.

0

cs.PL 2026-05-11 2 theorems

Verification takes ten times more effort than credible compilation

Quantitative Comparison of Credible Compilation and Verification In Coding Agent Compiler Development

A coding agent comparison of compiler optimizations shows higher session counts and simpler algorithms for the verified versions.

abstract click to expand

Formal program verification is a longstanding goal in the field. We present the first quantitative comparison of the two primary compiler verification approaches, credible compilation/translation validation and full verification. Working with the first verified compiler developed by a coding agent (operating under human supervision), we present quantitative results from a coding agent implementing several optimizations using these two approaches. The results indicate that 1) verification requires roughly an order of magnitude more development effort than credible compilation, 2) to enhance provability, the coding agent chooses less efficient algorithms and data structures for verified optimizations, and 3) in an attempt to minimize proof effort the coding agent repeatedly implemented optimization scope reductions for verified optimizations, and 4) certificate checking time dominates optimization and certificate generation time for the considered optimizations. Because of the increased proof overhead, verified optimizations required substantially more supervision and coding sessions than credible compilation optimizations. Given the capabilities of a modern coding agent working in this context, implementation efforts for both credible compilation and verified versions remained feasible for the considered optimizations (unreachable code elimination, dead assignment elimination, and constant propagation/folding).

0

cs.PL 2026-05-11 2 theorems

Refinement types act as ordinary types in Scala 3

First-Class Refinement Types for Scala

Logical predicates participate in subtyping, inference, and pattern matching without separate specifications

abstract click to expand

Refinement types -- types qualified with logical predicates -- have proven effective for lightweight verification in languages like Liquid Haskell, F*, and Dafny. However, in these systems refinements are either written in a separate specification language or treated as second-class annotations, disconnected from the host language's type system. This disconnect creates usability barriers: programmers must maintain two mental models, and refinements cannot interact with features like type inference, subtyping, or overloading. We present the design of first-class refinement types for Scala 3, where refinements are ordinary types that participate in subtyping, inference, and pattern matching alongside existing language features. We prove type soundness of a core calculus mechanized in Rocq, combining dependent function types, bounded polymorphism, positive equi-recursive types, union and intersection types, and refinement types under a partial-correctness semantics using a fuel-bounded definitional interpreter and semantic typing. Finally, we implement our design as a prototype extension of the Scala 3 compiler with a lightweight e-graph-based solver for predicate entailment.

0

cs.PL 2026-05-08 Recognition

Proof states become inspectable at source code level

A New Interaction Concept for Interactive and Autoactive Program Verification

The interface maps logical proofs back to original code and specs, letting users understand failures and locate defects without low-level re

abstract click to expand

Fully functional program verification is an undecidable$\unicode{x2014}$and, hence, inherently difficult$\unicode{x2014}$task, that is not automatically solvable but typically requires user interaction and guidance. Existing verifiers either work autoactively, requiring the user to write annotations in source code, without the possibility to inspect the proof state or intervene in case of an unsuccessful attempt, or allow interactions on a logical encoding that is on a lower level than the user-provided specifications. We present a novel interaction concept which allows the user to inspect and interact with the proof state on source code and specification level. This minimizes the mental gap between the representations. We provide an implementation of the concept as a plugin for the Java verification engine KeY, and show with a user study that this prototype can be beneficial for users to understand the proof state and find defects in source code or specifications.

0

cs.PL 2026-05-08 2 theorems

Controlled unsoundness yields sound bounds for non-monotone fixed points

Bounding Fixed Points of Non-Monotone Processes: Theory to Practice

A sound anytime algorithm tightens AFT approximations, terminates on finite lattices, and accelerates answer set programming and speculative

abstract click to expand

Many modern solvers and program analyzers rely on non-monotone reasoning (e.g. negation-as-failure, speculative updates, backtracking) for which classical monotone fixed-point methods do not apply. The general problem of finding the fixed points of these processes is a difficult one. For this reason, there have been theoretical efforts in existing Approximation Fixpoint Theory (AFT) from the domain of logic programming to approximate fixed points of non-monotone operators. Tight approximations of these fixed points are highly useful for accelerating non-monotonic computations by restricting the search space. In practice, however, even the best approximations obtained through AFT can be coarse and computationally expensive. We aim to address both issues to make AFT approximation methods practical for use in programming languages (PL) settings. To mitigate inefficiency, we prove the soundness of an abstract interpretation for approximating operators. To improve upon coarse approximations, we carefully introduce controlled unsoundness to design an effective yet practical algorithm for partitioning and tightening AFT's best approximations. This algorithm is sound, anytime, and guarantees termination on finite-height lattices. We further present a modification that ensures polynomial-time complexity. We instantiate these methods in two settings: (1) answer set programming, where it serves as a convergence-accelerating pre-processor, and (2) speculative program analysis, where it reduces rollback while preserving soundness. In both settings, we focus on implementation-level details to demonstrate the practical applicability of our methods.

0

cs.PL 2026-05-08 Recognition

14B model translates GIMPLE to LLVM IR better than 1000B models

LLM Translation of Compiler Intermediate Representation

Fine-tuned on paired examples from real C code, it beats general models by up to 44 points and enables cross-toolchain reuse.

abstract click to expand

GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate GIMPLE (as emitted by GCC) to LLVM IR (as emitted by LLVM). The model is trained on paired IRs extracted from C sources and evaluated on the GIMPLE-to-LLVM IR transformation applied to IRs derived from real-world C code and competitive programming problems. To the best of our knowledge, IRIS-14B is the first model trained explicitly for IR-to-IR translation. It outperforms the accuracy of widely used models, including the largest state-of-the-art open models available today, ranging from 13 to 1,000 billion parameters, by up to 44 percentage points. The proposed transformation supports the integration of LLMs as complementary components within hybrid neuro-symbolic compiler architectures, where models such as IRIS-14B act as interoperability layers enabling cross-toolchain workflows without modifying existing compiler passes, while traditional compiler infrastructure continues to perform deterministic compilation and optimization.

0

cs.PL 2026-05-08 Recognition

Cache-free GPU enumeration outperforms priors on MBA synthesis

GPU-Accelerated Synthesis of Mixed-Boolean Arithmetic: Beyond Caching

Bottom-up design without caches scales to larger specifications and bigger expressions for deobfuscation tasks.

abstract click to expand

Synthesizing Mixed-Boolean Arithmetic (MBA) expressions from input-output examples is central to program deobfuscation and also useful for compiler optimization, reverse engineering, and cryptanalysis. Existing MBA synthesizers are typically CPU-based and scale poorly on large specifications or complex targets. Recent GPU-accelerated synthesis methods achieve large speedups in qualitative settings, but they depend on caching observationally equivalent candidates; this strategy breaks down for MBA because candidate outputs are quantitative bitvectors and the behavioral space is enormous. We present SIMBA (Synthesis of Mixed-Boolean Arithmetic), a GPU-accelerated MBA synthesizer built around cache-free bottom-up enumeration. SIMBA avoids language caches entirely and uses a GPU-oriented enumeration design that keeps work local and highly parallel. In experiments, SIMBA is substantially faster than prior MBA synthesis tools, handles larger specifications, and reaches expression sizes that existing methods fail to solve. These results establish cache-free GPU synthesis as a practical and scalable approach for quantitative domains, and identify it as a strong alternative to cache-centric designs.

0

cs.PL 2026-05-07

Execution tests reveal BLEU fails to predict code translation correctness

Beyond BLEU: A Semantic Evaluation Method for Code Translation

LLM decompilers pass far more runtime checks than heuristics, with correlation to string overlap near zero.

abstract click to expand

Code translation is one of the core capabilities of LLMs. However, evaluating the correctness of translations remains difficult, as commonly used metrics such as BLEU measure only syntactic similarity, disregarding program semantics. We propose a novel evaluation methodology for code translation tasks, emphasizing semantic equivalence over surface-level string similarity. Our approach applies established compiler testing methodology to a new domain, allowing the assessment of an LLM fine-tuned for binary lifting tasks (i.e. decompiling binaries to higher-level representations). We introduce a semantic correctness score, defined as the proportion of translations that produce correct execution outcomes, and demonstrate its application by evaluating LLM-based and heuristic decompilers. Our findings show that LLM-based approaches significantly outperform heuristic ones, while BLEU scores show negligible correlation with semantic correctness (r = -0.127 to 0.354), demonstrating that syntactic metrics fail to predict functional accuracy.

0

cs.PL 2026-05-07

RISC-V semantics in Rocq links compilers to hardware

Interaction Tree Semantics for RISC-V: Bridging Compiler and Hardware Verification

Interaction Trees support machine-checked bisimulation proofs of equivalence and correctness, validated by passing all standard test suites.

abstract click to expand

The Instruction Set Architecture (ISA) is the contract between compilers and processors; proving this contract formally demands cross-level connection to existing mechanized compilers and hardware implementations. As an open, modular ISA gaining adoption across embedded, mobile, and cloud platforms, RISC-V makes a formally verified ISA specification particularly valuable. However, existing formal RISC-V specifications focus on hardware tooling rather than cross-level verification: they provide no machine-checked instruction-level properties and lack support for verifying this contract across levels. We address these limitations with a formal semantics of the RISC-V ISA in Rocq, built on Interaction Trees (ITrees). By leveraging ITree bisimulation and refinement, our semantics enables cross-level verification from compiler IR to hardware within a single framework. Our formalization covers a wide spectrum of RISC-V extensions. The correctness of individual instruction semantics is backed by machine-checked lemmas in Rocq. We further validate it by extracting an executable simulator that passes all standard RISC-V test suites. Three case studies demonstrate the effectiveness of our semantics for cross-level verification: first, we prove semantic equivalence via bisimulation between LLVM IR and RISC-V code on an array access pattern via Vellvm (LLVM ITree semantics); second, we apply translation validation to a specific instruction reordering for macro-operation fusion, distinguishing safe reorderings from those that break program-counter-relative addressing; third, we prove that a K\^oika hardware ALU correctly implements all R-type integer operations (e.g., ADD, SUB, AND) against our ISA contract.

0

cs.PL 2026-05-07

Refinement types verify hybrid synchronous programs

Towards Formal Verification of Hybrid Synchronous Programs with Refinement Types

Formal semantics for initial value problems and zero crossings support sound safety proofs on executable code.

abstract click to expand

Cyber-physical systems (CPS) such as autonomous cars, aircraft, and robots are often also safety-critical; thus it is imperative that they operate as intended with a high degree of certainty. Formal verification has been employed to verify the software controlling these systems, but due to their complexity, is usually performed on an abstract model rather than the executable code. Synchronous programming languages extended with differential equations promise both rigorous modeling and sufficient expressiveness to implement executable controller code, and recent developments have introduced formal verification of strictly discrete-time programs. Extending these verification techniques to hybrid systems enables precise modeling of the environment for a wider variety of programs to be both verified and executed. We formalize the operational semantics of initial value problems and zero-crossing detection expressed in a synchronous programming language, extend its type system for verification thereof, and prove its soundness.

0

cs.PL 2026-05-06 3 theorems

Reclassifying eval as a governed effect for AI code synthesis

Governed Metaprogramming for Intelligent Systems: Reclassifying Eval as a Governed Effect

Dynamic program generation in intelligent systems requires treating the move from form to execution as an inspected authority step.

abstract click to expand

AI systems increasingly synthesize executable structure at runtime: LLMs generate programs, agents construct workflows,self-improving systems modify their own behavior. In classical homoiconic and staged languages, the transition from coderepresentation to execution is unrestricted. eval is a language primitive, not a governed operation. We argue that ingovernedintelligent systems, this transition is an authority amplification: it converts symbolic structure into executableauthority andmust be mediated like any other effect. We present governed metaprogramming, a language design where programrepresentations(machine forms) are first-class values, form manipulation is pure computation, and materialization (the transition fromform toexecutable machine) is a governed effect subject to structural inspection. The governance system analyzes the proposedprogram'scapability requirements, policy compliance, and resource estimates before permitting execution. We formalize twojudgments: pureform evaluation (which emits no directives) and governed materialization (which emits exactly one governed directive). Weprovethree properties: purity of form manipulation, the no-bypass theorem, and boundary preservation. We implement the designinMashinTalk, a DSL for AI workflows compiling to BEAM bytecode, and report on integration with 454 existingmachine-checked Rocqtheorems. The central contribution is reclassifying eval from a language primitive into a governed effect.

0

cs.PL 2026-05-06 2 theorems

Eval reclassified as governed effect for runtime code synthesis

Governed Metaprogramming for Intelligent Systems: Reclassifying Eval as a Governed Effect

Program forms stay first-class and pure while materialization requires inspection of capabilities and policy before execution.

abstract click to expand

AI systems increasingly synthesize executable structure at runtime: LLMs generate programs, agents construct workflows,self-improving systems modify their own behavior. In classical homoiconic and staged languages, the transition from coderepresentation to execution is unrestricted. eval is a language primitive, not a governed operation. We argue that ingovernedintelligent systems, this transition is an authority amplification: it converts symbolic structure into executableauthority andmust be mediated like any other effect. We present governed metaprogramming, a language design where programrepresentations(machine forms) are first-class values, form manipulation is pure computation, and materialization (the transition fromform toexecutable machine) is a governed effect subject to structural inspection. The governance system analyzes the proposedprogram'scapability requirements, policy compliance, and resource estimates before permitting execution. We formalize twojudgments: pureform evaluation (which emits no directives) and governed materialization (which emits exactly one governed directive). Weprovethree properties: purity of form manipulation, the no-bypass theorem, and boundary preservation. We implement the designinMashinTalk, a DSL for AI workflows compiling to BEAM bytecode, and report on integration with 454 existingmachine-checked Rocqtheorems. The central contribution is reclassifying eval from a language primitive into a governed effect.

0

cs.PL 2026-05-05

Pact maps choreographic protocols to formal games

Pact: A Choreographic Language for Agentic Ecosystems

Designers can solve for the decision policies that self-interested agents will follow in multi-party settings.

abstract click to expand

Recent advances in large language models have led to the rise of software systems (i.e. agents) that execute with increasing autonomy on behalf of users in open, multi-party settings, interacting with untrusted counterparts and managing private information. Choreographic programming offers correct-by-construction protocol-design for such settings, but assumes cooperative participants -- it has no notion of agent self-interest, that is, why an agent will follow a protocol. In this talk we introduce Pact, a choreographic language extended with operations to describe agent choices and preferences, drawing from the rich literature of game theory. Every Pact protocol maps to a formal game, allowing protocol designers to reason about game-theoretic properties of their protocols, such as solving for decision policies. We present Pact's design and a preliminary implementation -- a bounded-rational solver that computes decision policies over Pact protocols -- and findings from applying this language to multi-party coordination with self-interested agentic participants.

0

cs.PL 2026-05-05

Neural CPS verified in any chosen theorem prover

Compositional Neural-Cyber-Physical System Verification in the Interactive Theorem Prover of Your Choice

Transpiling functional neural specifications enables infinite-horizon safety proofs for discrete and continuous systems across Rocq, Is

abstract click to expand

Formal verification of neuro-symbolic cyber-physical systems, such as drones, medical devices and robots, is complicated. Neural components must be trained to be optimal with respect to the available data as well as the safety specifications, and then verified using specialised solvers. Symbolic models of the "cyber" and "physical" behaviour of the system must be constructed and verified in interactive theorem provers (ITPs), often requiring mature mathematical libraries to reason about the interplay of discrete and continuous dynamics, preferably obtaining infinite time-horizon guarantees. Finally, the results of the two already challenging verification tasks need to be integrated into a single proof in a coherent and consistent way, whilst preserving deployability of the resulting model. In this paper we present a compositional methodology for constructing such proofs. The Vehicle framework provides a functional, domain-specific language for specifying, training, and verifying neural components. We extend Vehicle to allow integration with any ITP with minimal effort. First, we describe how Vehicle's standard bidirectional type checker can be reused to transpile neural specifications into an intermediate representation targeting multiple theorem provers. Second, we integrate Vehicle with Rocq, Isabelle/HOL, Agda and the industrial prover Imandra; and showcase a generic infinite time-horizon safety proof of a discrete cyber-physical system with a neural network controller in each ITP. Finally, we use the Mathematical Components libraries in Rocq to verify infinite time-horizon safety of a medical device, modelled as a continuous cyber-physical system with a neural controller. To our knowledge, this is the first result of this kind in a general purpose ITP; and a result that was only feasible thanks to the compositionality provided by Vehicle's functional interface.

0

cs.PL 2026-05-05

Java streams fused at compile time via single mapMulti call

Compile-Time Java Stream Fusion via mapMulti

Merges map and filter stages to remove intermediate objects without converting pipelines to loops.

abstract click to expand

The Java Stream API, introduced in Java 8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures.

0

cs.PL 2026-05-05

Four-step loop produces reliable research benchmarks

How to benchmark: the Measure-Explain-Test-Improve loop

Measure, explain, test, and improve steps let novices create solid performance evaluations in programming language work.

abstract click to expand

I would like to share recommendations on how to do performance benchmarks for the purpose of computer science research evaluation. Research in my field (programming language research) often involves performance considerations, but it is typically not the main tool used to evaluate our research (typically we evaluate via formal statements and their proofs, experience writing large or interesting examples, or systematic comparison of expressivity, feature set, etc.). My impression is that, as a result, we tend to not do our performance evaluation very well. In the present document I will try to explain a methodology to do benchmarking correctly (I hope!). People with no former benchmarking experience should be able to build solid performance evaluation as part of their research. I explain the justification for each aspect along the way.

0

cs.PL 2026-05-04

Hoare logic derivations can define interpreters

Towards Definitional Interpreters for Hoare Logics

Operating on proof structure rather than code lets correctness rules guide semantics for heaps, recursion, and subtyping.

abstract click to expand

Intrinsic definitional interpreters, definitional interpreters that operate on typing derivations instead of abstract syntax trees, have recently been studied as a promising methodology for defining dynamic semantics of programming languages. A key benefit is that type safety interactively guides and constrains the interpreter's construction. Analogously to typing relations, Hoare logic is widely used to guarantee program correctness. Can intrinsic definitional interpreters be realized to operate over Hoare-logic derivations? We explore this question in depth by developing definitional interpreters in Rocq for (i) a basic Hoare logic, and (ii) a realistic logic featuring heaps, dynamic-frame-based local reasoning, well-founded functions, and behavioral subtyping. Central to our approach is a novel technique we call entry-indexing, which we use to interpret total-correctness derivations and well-founded functions. Our second development yields, to our knowledge, the first formalization of a dynamic-frame-based Hoare logic with well-founded functions, behavioral subtyping, and total correctness, as well as the first fully mechanized Hoare logic with dynamic frames.

0

cs.PL 2026-05-04 Recognition

AI-generated Lean proofs certify Axon compiler output without audits

Testing, Credible Compilation, and Verification in the Axon Verified Compiler in Lean and Claude Code

Machine-checked guarantees from Claude Code remove the need to inspect the generated code itself.

abstract click to expand

This paper presents the use of testing, credible compilation/translation validation, verification, and audits in the Axon compiler. Axon comes with fully machine checked proofs that guarantee the correctness of the generated code. All code and proofs were written in Lean by Claude Code, with the correctness proofs eliminating any need to audit or examine any verified code. I present a development process for using these validation techniques, evaluate the use of this process during the development of the compiler, and discuss implications for other development efforts.

1 0

0

cs.PL 2026-05-04

Machine-checked proofs guarantee Axon compiler correctness

Testing, Credible Compilation, and Verification in the Axon Verified Compiler in Lean and Claude Code

Fully machine-checked Lean proofs written by Claude Code remove any need to audit generated code

abstract click to expand

This paper presents the use of testing, credible compilation/translation validation, verification, and audits in the Axon compiler. Axon comes with fully machine checked proofs that guarantee the correctness of the generated code. All code and proofs were written in Lean by Claude Code, with the correctness proofs eliminating any need to audit or examine any verified code. I present a development process for using these validation techniques, evaluate the use of this process during the development of the compiler, and discuss implications for other development efforts.

1 0

0

cs.PL 2026-05-04

FlowBook enforces reproducibility in computational notebooks by dynamically tracking read…

FlowBook: Enforcing Reproducibility in Computational Notebooks

FlowBook enforces notebook reproducibility by checking whether top-to-bottom execution from an empty state matches recorded outputs, using…

abstract click to expand

Computational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).

0

cs.PL 2026-05-04

Compiler abstraction matches expert CUDA on distributed AI tasks

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

DITRON's three-level hierarchy enables 6-30% kernel speedups and over 10% MFU gains in LLM training and inference without hand tuning.

abstract click to expand

The scaling of large language models (LLMs) is currently bottlenecked by the rigidity of distributed programming. While high-performance libraries like CuBLAS and NCCL provide optimized primitives, they lack the flexibility required for rapidly evolving model architectures. Conversely, existing tensor compilers fail to address the complex memory hierarchy of distributed clusters effectively. To bridge this gap, we propose DITRON, a scalable tile-level compiler that democratizes high-performance distributed kernel development. DITRON introduces a novel hierarchical programming abstraction spanning Core, Device, and Task levels to map tensor programs efficiently onto heterogeneous distributed hardware. This abstraction allows DITRON to support diverse parallelism strategies while abstracting away the complexity of inter-node and intra-node communication. Evaluated across large-scale clusters, DITRON achieves performance parity with or exceeding expert-tuned CUDA libraries, delivering speedups of $6\%-30\%$ on isolated kernels and $5\%-30\%$ on end-to-end inference in vLLM. Furthermore, DITRON demonstrates strong portability, achieving significant speedups on both NVIDIA and AMD platforms. \ours{} has been deployed at the enterprise level for both training and inference. It achieves an MFU improvement of over 10\% in training tasks, saving approximately 500,000 GPU hours of training cost per month. For inference tasks, it delivers an end-to-end gain of over 20\% and has been applied to cloud service inference and edge inference scenarios.

0

cs.PL 2026-05-04

Compiler splits recursive datatypes into separate field buffers

SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes

SoCal and Colobus turn interleaved trees into factored buffers, producing 1.46x speedup on tree benchmarks without manual rewriting.

abstract click to expand

Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.

0

cs.PL 2026-05-04

Verifier checks MLIR program equivalence in linear time

Practical Formal Verification for MLIR Programs

Hybrid interpretation verifies optimizations on toolchains and hundreds of variants without depending on implementation details

abstract click to expand

Optimizing compilers have become a cornerstone for high-performance program generation in research and industry. Optimizations, including those implemented manually by a user and those target-specific and non-target-specific, are used to transform programs to achieve good performance. Although these optimizations are necessary for performance, assessing their correctness has remained a major challenge; the risk of incorrect code being deployed increases with unproven optimization flows. In this work, we target the formal verification of correctness of a transformed program by computing whether a pair of programs are semantically equivalent, one being a transformed version of the other. We restrict the class of programs supported to enable a hybrid concrete-symbolic interpretation approach to equivalence, which in turn is mostly agnostic to how the programs are implemented (syntax, schedule, storage, etc.). This approach can show equivalence in linear time with respect to the operations executed by the programs. We develop a verifier for a meaningful subset of MLIR, and report on the verification of the AMD MLIR-AIR and MLIR-AIE toolchains, as well as the standard mlir-opt on hundreds of benchmarks variants.

0

cs.PL 2026-05-04

Erasure modeled as context proposition conserves MLTT in both phases

Type Theory With Erasure

Structural SOGAT formulation yields correct extraction to untyped lambda calculus via presheaf gluing.

abstract click to expand

Erasure enriches type theory with a distinction between runtime relevant and irrelevant data, allowing the compilation step to safely erase the latter. Versions of this feature are implemented by many systems, including Agda, Idris, and Rocq. We present a structural version of type theory with erasure, formulated as a second-order generalised algebraic theory (SOGAT). Erasure is encoded as a phase distinction between runtime and erased terms, in the form of a proposition that can appear in a context. This formulation has several advantages: it has models based on categories with families, is compatible with other structural features such as staging, and provides a better guideline for implementation. Through the model theory of SOGATs, we study the semantics of type theory with erasure in families of sets, which generalises to any Grothendieck topos equipped with a tiny proposition. We establish conservativity over Martin-L\"of type theory (MLTT) in both phases. For code extraction, we construct a presheaf model that produces untyped lambda calculus programs and prove its correctness through gluing. Our results are formalised in Agda and we provide a toy elaborator implementation.

0

cs.PL 2026-05-01

Functional logic programs run faster in monadic Haskell than in Curry

A Monadic Implementation of Functional Logic Programs

A pure monadic encoding supports memoization of non-deterministic choices and advanced features such as functional patterns to deliver the 0

abstract click to expand

Functional logic languages are a high-level approach to programming by combining the most important declarative features. They abstract from small-step operational details so that programmers can concentrate on the logical aspects of an application. This is supported by appropriate evaluation strategies. Demand-driven evaluation from functional programming is amalgamated with non-determinism from logic programming so that solutions or values are computed whenever they exist. This frees the programmer from considering the influence of an operational strategy on the success of a computation, but it is a challenge to the language implementer. A non-deterministic demand-driven strategy might duplicate unevaluated choices of an expression, which could duplicate the computational effort. In recent implementations, this problem has been tackled by adding a kind of memoization of non-deterministic choices to the expression under evaluation. Since this has been implemented in imperative target languages, it was unclear whether this could also be supported in a functional programming environment like Haskell. This paper presents a solution to this challenge by transforming functional logic programs into a monadic representation. Although this transformation is not new, we present an implementation of the monadic interface which supports memoization in non-deterministic branches. Additionally, we include more advanced features of functional logic languages, namely functional patterns and encapsulated search, in our approach. By optimizing our implementation for purely functional computations with both a static and dynamic approach, we are able to achieve a promising performance that outperforms current compilers for Curry.

0

cs.PL 2026-04-29

Finitely supported functions unify logic and functional programming

Finite Functional Programming

Predicates become input-output tables; graded effects and relevance types keep support finite while handling Datalog and aggregation.

abstract click to expand

We unify functional and logic programming by treating predicatesas functions equipped with their support: the set of inputs whose output is nonzero. Datalog, for instance, is a language of finitely supported boolean functions. Finite support allows representing functions as input-output tables. Generalizing from boolean functions to other pointed sets neatly handles aggregation and weighted logic programming. We refer to the combination of finitely supported functions, represented as data, with higher order functions, represented as code, as finite functional programming. We give a simple type system to check finite support, using graded effects to check variable grounding and relevance types to model pointed sets.

0

cs.PL 2026-04-28

Hybrid Path-Sums verify mixed quantum-classical programs

Hybrid Path-Sums for Hybrid Quantum Programs

A new symbolic form and rewriting rules let tools check equivalence and probabilities in programs that combine quantum and classical control

abstract click to expand

As quantum computing becomes an emerging reality, designing efficient quantum programming capabilities is becoming more and more important. Particularly, the debugging and validation of quantum programs is of paramount importance, as these programs are by definition hard to test. Static analysis and formal verification methods for quantum programs started to emerge a few years now, yet they often miss hybrid quantum/classical reasoning facilities with, e.g., generic quantum control, classical control and classical computation instructions. In this paper, we lay out the foundations of a framework for the automated formal verification of (full) hybrid quantum programs featuring both classical and quantum control, measurement and hybrid data structures. In particular, we propose: (1) a novel symbolic representation for describing and manipulating sets of hybrid quantum/classical states called Hybrid Path-Sums (HPS); (2) a set of rewriting rules providing a rich mechanism for simplifying and reasoning on these symbolic hybrid states, and (3) a core assertion language to specify equivalence of hybrid quantum programs, the satisfaction of properties on (parts of) hybrid states, and the extraction of probabilistic statements about the program behavior. We prove the correctness of the novel symbolic representation, of its rewriting system and of the specification system. Finally, we propose a full implementation of this framework as a dedicated symbolic execution engine for hybrid programs. We present an evaluation of a set of representative hybrid case-studies from the literature, showcasing the advantage of our approach and its efficiency compared to state-of-the-art solutions.

0

cs.PL 2026-04-27

Staging verifies tensor shapes at compile time via assertions

Compile-Time Tensor Shape Checking via Staged Shape-Dependent Types

Shape consistency conditions are checked before code generation, so successfully compiled tensor programs are guaranteed to be free of shape

abstract click to expand

When writing programs involving matrices or tensors in general, it is desirable to rule out the inconsistency of tensor shapes (i.e., the generalization of matrix sizes) before actual computation. For this purpose, some languages provide dependent types such as Mat m n, and others offer refinement types to track predicates for shapes. Despite the theoretical maturity, however, such methods are often unhandy for continuous software development due to the requirement of proofs for judging type equality or subtyping; even automated proving is often unsuitable due to its unforeseeable time consumption. To remedy this, our study provides an alternative formalization by using staging. Based on the observation that conditions for the shape consistency can be extracted before running the actual tensor computations in many typical cases, we ensure such consistency by assertions evaluated as compile-time computations, not by proofs. Under this formalization, we can verify the consistency virtually statically in the sense that inconsistencies as to tensor shapes will be immediately detected as assertion failures during compile-time computation. Our work achieves a mathematical guarantee that successfully generated code is always consistent with respect to tensor shapes. Furthermore, to vastly lessen the burden of adding shape- or stage-related descriptions, we (1) allow shape-related arguments to be implicit and infer them in a best-effort manner, and (2) offer a non-staged surface language that seemingly resembles ordinary dependently-typed languages and translate its programs into the staged core language. By a prototype implementation, we confirm that our language is expressive enough to verify a number of programs, including several examples offered by ocaml-torch.

0

cs.PL 2026-04-27

Concolic pruning cuts state explosion in multiverse microcontroller debugging

Remote Concolic Multiverse Debugging -- Extended Version with Additional Appendices

By selecting only input values that create distinct paths during live sessions, the debugger shrinks explored states while preserving full代码

abstract click to expand

Debugging nondeterministic programs is inherently difficult, particularly in microcontroller environments where execution paths can diverge unpredictably due to external sensor inputs. Traditional debugging techniques often fail to capture or reproduce this nondeterministic behavior effectively. Multiverse debugging has emerged as a compelling technique to debug nondeterministic programs, allowing developers to systematically explore all possible execution paths. Unfortunately, current multiverse debuggers are snapshot-based and most operate over a model of the program, limiting their use for debugging resource-constrained microcontrollers. Additionally, current multiverse debuggers, even ones specifically designed for microcontrollers suffer from state explosion making the state space overwhelming during debugging. To address these challenges, we introduce a trace-based multiverse debugger with a novel state-space reduction technique based on concolic execution. Our approach interleaves concolic analysis with live debugging to identify input values that define unique program paths. This hybrid technique efficiently prunes redundant paths from the state space while ensuring full code coverage. Unlike MIO, a recently published multiverse debugger for microcontrollers that focuses on IO consistency, our approach directly targets state explosion by leveraging concolic execution and uses a trace-based approach, significantly reducing the memory and communication overhead. We implemented a prototype using the WARDuino WebAssembly VM, demonstrating the feasibility and efficiency of our approach in real-world scenarios. Our results highlight substantial reductions in the state space compared to traditional multiverse debugging. This makes multiverse debugging more accessible and efficient for developers working with complex, nondeterministic programs running on microcontrollers.

0

cs.PL 2026-04-27

Ownership generalization verifies nested arrays

Ownership Refinement Types for Pointer Arithmetic and Nested Arrays

Allowing ownership to reference outer indices enables sound checking of programs with matrices and pointer arithmetic.

abstract click to expand

Tanaka et al. proposed a type system for verifying functional correctness properties of programs that use arrays and pointer arithmetic. Their system extends ConSORT -- a type system combining fractional ownership and refinement types for imperative program verification -- with support for pointer arithmetic. Their idea was to extend fractional ownership so that it can depend on an array index. Their formulation, however, does not handle nested arrays, which are essential for representing practical data structures such as matrices. We extend Tanaka et al.'s type system to support nested arrays by generalizing the notion of ownership to be able to refer to the indices of the outer arrays and prove the soundness of the extended type system. We have implemented a verifier based on the proposed type system and demonstrated that it can verify the correctness of programs that manipulate nested arrays, which were beyond the reach of Tanaka et al.

0

cs.PL 2026-04-27

Crystality semantics decomposes into parts while staying equivalent

From Monolithic to Compositional: A Compositional Operational Semantics for Crystality

Local engines plus global coordinator yield simple proofs of isolation and commutativity without behavioral change.

abstract click to expand

Parallel execution has become a key approach to improving blockchain scalability, but the lack of formal semantics for smart contract languages in such settings makes rigorous reasoning difficult. Crystality is a smart contract language designed for parallel EVMs, supporting scoped state and asynchronous relay across execution engines. This paper introduces a compositional operational semantics for Crystality. Unlike the original monolithic semantics, the new semantics decomposes the system into engine components and a global component, making the structure of parallel execution explicit. The compositional formulation enables simple proofs of key structural properties, including locality, global isolation, and strong commutativity of independent local steps. Furthermore, we prove that the compositional semantics is semantically equivalent to the original one via a transaction-level bisimulation theorem based on encoding and decoding functions between configurations, and two code-level bisimulation theorems for local and global execution.

0

cs.PL 2026-04-24

Causal graph test matches noninterference semantics

Causality and Semantic Separation

Mechanized result shows d-separation for confounders equals a security-style semantic property, without probabilities.

abstract click to expand

The design of scientific experiments deserves its own variation of formal verification to catch cases where scientists made important mistakes, such as forgetting to take confounding variables into account. One of the most fundamental underpinnings of science is causality, or what it means for interventions in the world to cause other outcomes, as formalized by computer scientists like Judea Pearl. However, these ideas had not previously been made rigorous to the standards of the programming-languages community, where one expects a (syntactic) program analysis to be proved sound with respect to a natural semantics. In the domain of causality, as the relevant "program analysis," we focus on $d$-separation, a classic condition on graphs that can be used to decide when the design of an experiment controls for sufficiently many confounding variables, even though the reason that this condition works is often unintuitive. Our central result (mechanized in Rocq) is that $d$-separation exactly coincides with a novel semantic definition inspired by noninterference from the theory of security. This characterization provides a structural semantic foundation for $d$-separation and helps explain why the graph-theoretic condition is correct, independently of probabilistic assumptions. For each given automated test on the quality of an experiment design, our theorem justifies an associated method for falsifying the world-modeling hypothesis behind the experiment.

0

cs.PL 2026-04-24

Session types generate network monitors that enforce protocols in the data plane

NEST: Network Enforced Session Types (Technical Report)

Algorithms translate session types into P4 programs that detect violations even with packet loss and reordering.

abstract click to expand

This paper introduces NEST (Network-Enforced Session Types), a runtime verification framework that moves application-level protocol monitoring into the network fabric. Unlike prior work that instruments or wraps application code, we synthesize packet-level monitors that enforce protocols directly in the data plane. We develop algorithms to generate network-level monitors from session types and extend them to handle packet loss and reordering. We implement NEST in P4 and evaluate it on applications including microservice and network-function models, showing that network-level monitors can enforce realistic non-trivial protocols.

0

cs.PL 2026-04-24

Linear constraints automate linear arguments in Haskell

Linear Constraints

They serve as implicit linear arguments resolved by an extended GHC solver, with safety ensured by desugaring to Linear Haskell.

abstract click to expand

Linear constraints are the linear counterpart of Haskell's class constraints. Linearly typed parameters allow the programmer to control resources such as file handles and manually managed memory as linear arguments. Indeed, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. Linear constraints address this shortcoming: a linear constraint acts as an implicit linear argument that can be filled in automatically by the compiler. We present this new feature as a qualified type system, together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell. This paper is a revised and extended version of a previous paper by the same authors (arXiv:2103.06127). The formal system and the constraint solver have been significantly simplified and numerous additional applications are described.

0

cs.PL 2026-04-23

Coordination models generate correct Solidity smart contracts

Automatic Code and Test Generation of Smart Contracts from Coordination Models

A toolchain validates high-level specs, emits code, and produces tests for dynamic decentralized workflows on blockchain.

abstract click to expand

We propose a formal approach for specifying and implementing decentralised coordination in distributed systems, with a focus on smart contracts. Our model captures dynamic roles, data-driven transitions, and external coordination interfaces, enabling high-level reasoning about decentralised workflows. We implement a toolchain that supports formal model validation, code generation for Solidity (our framework is extendable to other smart contract languages), and automated test synthesis. Although our implementation targets blockchain platforms, the methodology is platform-agnostic and may generalise to other service-oriented and distributed architectures. We demonstrate the expressiveness and practicality of the approach by modelling and realising some coordination patterns in smart contracts.

0

cs.PL 2026-04-22

NumPy-like DSL builds full frontend inside MLIR

Demonstrating a Future for MLIR-native DSL Compilers on a NumPy-like Example

Dialect-agnostic type checker and parallel lowering enable native analyses and offloading for weather and CFD workloads.

abstract click to expand

Compilers for general-purpose languages have been shown to be at a disadvantage when it comes to specialized application domains as opposed to their Domain-Specific Language (DSL) counterparts. However, the field of DSL compilers features little consolidation in terms of compiler frameworks and adjacent software ecosystems. As a result, considerable work is duplicated, lost to maintenance issues, or remains undiscovered, and most DSLs are never considered "production-ready". One notable development is the introduction of the Multi-Level Intermediate Representation (MLIR), which promises a similar impact on DSL compilers as LLVM had on general-purpose tooling. In this work, we present a NumPy-like DSL made for offloading numeric tensor kernels that is entirely MLIR-native. In a first for open-source, it implements all frontend actions and semantic analyses directly within MLIR. Most notably, this is made possible by our new dialect-agnostic MLIR type checker, created for the future of DSLs in MLIR. We implement a simple, yet effective, parallel-first lowering scheme that connects our language to another MLIR dataflow dialect for seamless offloading. We show that our approach performs well in real-world use cases from the domain of weather modeling and Computational Fluid Dynamics (CFD) in Fortran.

0

cs.PL 2026-04-21

Logical relations model where-declassification for higher-order code

Compositional security definitions for higher-order where declassification

Halting indistinguishability checks after relevant releases produces a compositional definition stronger than prior lower-order work.

abstract click to expand

To ensure programs do not leak private data, we often want to be able to provide formal guarantees ensuring such data is handled correctly. Often, we cannot keep such data secret entirely; instead programmers specify how private data may be declassified. While security definitions for declassification exist, they mostly do not handle higher-order programs. In fact, in the higher-order setting no compositional security definition exists for intensional information-flow properties such as where declassification, which allows declassification in specific parts of a program. We use logical relations to build a model (and thus security definition) of where declassification. The key insight required for our model is that we must stop enforcing indistinguishability once a \emph{relevant declassification} has occurred. We show that the resulting security definition provides more security than the most related previous definition, which is for the lower-order setting. This paper is an extended version of the paper of the same name published at OOPSLA 2023 ([21]).

0

cs.PL 2026-04-21

DSL reduces CPS data sanitization to a few lines of code

Implementing CPSLint: A Data Validation and Sanitisation Tool for Industrial Cyber-Physical Systems

CPSLint replaces repetitive ad-hoc scripts with reusable expressions for time-series validation and preparation in industrial systems.

abstract click to expand

Raw datasets are often too large and unstructured to work with directly, and require a data preparation phase. The domain of industrial Cyber-Physical Systems (CPSs) is no exception, as raw data typically consists of large time-series data collections that log the system's status at regular time intervals. The processing of such raw data is often carried out using ad hoc, case-specific, one-off Python scripts, often neglecting aspects of readability, reusability, and maintainability. In practice, this can cause professionals such as data scientists to write similar data preparation scripts for each case, requiring them to do much repetitive work. We introduce CPSLint, a Domain-Specific Language (DSL) designed to support the data preparation process for industrial CPS. CPSLint raises the level of abstraction to the point where both data scientists and domain experts can perform the data preparation task. We leverage the fact that many raw data collections in the industrial CPS domain require similar actions to render them suitable for data-centric workflows. In our DSL one can express the data preparation process in just a few lines of code. CPSLint is a publicly available tool applicable for any case involving time-series data collections in need of sanitisation.

0

cs.PL 2026-04-20

MSCs project global specs to deadlock-free LLM agent programs

Provable Coordination for LLM Agents via Message Sequence Charts

A syntax-directed method separates coordination from unpredictable LLM actions and guarantees no deadlocks.

abstract click to expand

Multi-agent systems built on large language models (LLMs) are difficult to reason about. Coordination errors such as deadlocks or type-mismatched messages are often hard to detect through testing. We introduce a domain-specific language for specifying agent coordination based on message sequence charts (MSCs). The language separates message-passing structure from LLM actions, whose outputs remain unpredictable. We define the syntax and semantics of the language and present a syntax-directed projection that generates deadlock-free local agent programs from global coordination specifications. We illustrate the approach with a diagnosis consensus protocol and show how coordination properties can be established independently of LLM nondeterminism. We also describe a runtime planning extension in which an LLM dynamically generates a coordination workflow for which the same structural guarantees apply. An open-source Python implementation of our framework is available as ZipperGen.

0

cs.PL 2026-04-20

First correct precise translator turns C interfaces into safe Rust

&inator: Correct, Precise C-to-Rust Interface Translation

Constraint solver yields declarations that admit semantics-preserving safe-Rust implementations

abstract click to expand

Automatically translating system software from C to Rust is an appealing but challenging problem, as it requires whole-program reasoning to satisfy Rust's ownership and borrowing discipline. A key enabling step in whole-program translation is interface translation, which produces Rust declarations for the C program's top-level declarations (i.e., structs and function signatures), enabling modular and incremental code translation. This paper introduces correct, precise C-to-Rust interface translation, called &inator. &inator employs a novel constraint-based formulation of semantic equivalence and type correctness including borrow-checking rules to produce a Rust interface that is correct (i.e., the interface admits a semantics-preserving implementation in safe Rust) and precise (i.e., it uses the simplest, least costly types). Our results show &inator produces correct, precise Rust interfaces for real C programs, but support for certain C features and scaling to large programs are challenges left for future work. This work advances the state of the art by being the first correct, precise approach to C-to-Rust interface translation.

0

cs.PL 2026-04-20

Algorithm provably load-balances any sparse tensor algebra

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

Generalizes merging techniques to multi-operand hierarchical structures and generates competitive parallel kernels for CPUs and GPUs.

abstract click to expand

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the computation of any sparse tensor algebra expression across parallel execution units. Our algorithm generalizes parallel merging algorithms to any number of operands, and to multi-dimensional, hierarchical sparse data structures. We implement our algorithm within an existing sparse tensor algebra compilation framework to automatically generate parallel sparse tensor algebra kernels that target multi-core CPUs and GPUs. We show that our generated code is competitive with hand-implemented parallelization strategies used by vendor libraries like Intel MKL and NVIDIA cuSPARSE (geo-means of $0.73$--$3.4\times$) and \textsc{Taco} (geo-means of $1.0$--$2.4\times$), and significantly outperforms general-purpose strategies for sparse tensor expressions where specialized algorithms have not been developed (geo-means of $2.0$--$6.4\times$).

0

cs.PL 2026-04-20

Scala 3 contracts catch schema drift at compile time

Shift schema drift left: policy-aware compile-time contracts for typed JVM and Spark pipelines

Framework proves producer compatibility before runtime and adds checks Spark omits at the sink.

abstract click to expand

Schema drift in data pipelines is often caught only when a job touches real data. Typed-Dataset layers close part of this gap but require wholesale adoption; table-level enforcement systems close another part but operate at write time against a stored schema. We present a small Scala 3 framework that occupies the seam: it proves producer-to-contract structural compatibility under explicit policies at compile time, derives Spark schemas from the same contract types, and re-checks the actual DataFrame schema at the sink boundary before write. The artifact fuses the compile-time witness with a policy-aware runtime comparator that adds a nested-collection-optionality check Spark's built-in comparators omit and implements structural subset semantics for backward- and forward-compatible field sets. Evaluation covers compile-time proofs, runtime policy tests, builder-path end-to-end tests, and reproducible benchmarks on two environments. This is a narrow, honest mechanism artifact; the broader claim that compile-time structural contracts deliver measurable productivity or reliability in practice is stated as motivation and left for future work.

0

cs.PL 2026-04-20

jMT tests Java memory models via causality on execution graphs

jMT: Testing Correctness of Java Memory Models (Extended Version)

The tool constructs per-program multi-execution semantics to check new JMM proposals against compiler behaviors and optimizations.

abstract click to expand

Folklore is often saying "The Java memory model is broken." Therefore, several approaches have proposed repairs, only to find new programs exhibiting unexpected, unintuitive behavior or the model forbidding standard compiler optimizations. The complexity of defining a memory model for concurrent Java lies in the fact that it requires a multi-execution model. Multi-execution models need to inspect the many potential executions of a program in order to find the valid ones. Tools automatically validating novel proposals of Java memory models are, however, largely lacking. To alleviate this problem, we introduce jMT, a novel tool for constructing multi-execution semantics for concurrent Java programs. jMT relies on single-execution models defining well-formed execution graphs, based on which it builds valid multi-execution semantics via causality checking. Thereby, jMT supports evaluating new proposals of Java memory models (JMMs) on a per-program basis. jMT can furthermore be employed for testing the conformance of JMMs to existing compilation schemes and compilers. Our evaluation of jMT on 169 litmus tests reveals a number of interesting insights into existing JMMs.

0

cs.PL 2026-04-20

Documentation becomes a live

Literate Execution

Literate execution uses provenance tracking in Fluid to let readers explore how inputs shape results through computed explanations and views

abstract click to expand

\emph{Literate programming}, introduced by Knurth, interleaves code and prose so that a program can be read as both executable and explanatory text. We propose \emph{literate execution}, which inverts this relationship: rather than embedding code within a static narrative, we treat documentation -- and other expository elements such as visualisations -- as first-class artefacts that can be computed alongside a running program and then integrated into a view of its execution. We explore this idea through Fluid, a programming language with a provenance-tracking runtime that records fine-grained dependencies between inputs and outputs. These provenance relationships can be surfaced as interactions that allow readers to explore how intermediate values contribute to a result. By integrating visualisation, provenance, and exposition, literate execution aims to make programs more explorable and self-explanatory, and explorable explanations easier to program.

0

cs.PL 2026-04-20

Category framework automates backward error bounds for floating-point code

Synthesizing Backward Error Bounds, Backward

Shel morphisms and an automated search yield stability proofs for programs with variable reuse that manual methods could not reach.

abstract click to expand

Backward stability is a desirable property for a well-designed numerical algorithm: given an input, a backward stable floating-point program produces the exact output for a nearby input. While automated tools for bounding the forward error of a numerical program are well-established, few existing tools target backward error analysis. We present a formal framework that enables sound, automated backward error analysis for a broad class of numerical programs. First, we propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Second, based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. Third, we implement a tool, eggshel, that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program. Our algorithm handles many programs with variable reuse, a known challenge in backward error analysis. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of automated analysis.

0

cs.PL 2026-04-17 Recognition

Synthesized contracts let clients be verified against tested libraries

Verification Modulo Tested Library Contracts

A learning loop refines modular and contextual contracts until they prove the client correct and pass library tests.

abstract click to expand

We consider the problem of verification modulo tested library contracts as a step towards automating the verification of client programs that use complex libraries. We formulate this problem as the synthesis of modular contracts for the library methods used by the client that are adequate to prove the client correct, and that also pass the scrutiny of a testing engine that tests the library against these contracts. We also consider a new form of method contracts called contextual contracts that arise in this setting that hold in the context of the client program, and can often be simpler and easier to infer than classical modular contracts. We provide a counterexample-guided learning framework to solve this problem, in which the synthesizer interacts with a constraint solver as well as the testing engine in order to infer adequate modular/contextual method contracts and inductive invariants for the client. The main synthesis engines we use are generalizing CHC solvers that are realized using ICE learning algorithms. We realize this framework in a tool called DUALIS and show its efficacy on benchmarks where clients call large libraries.

0

cs.PL 2026-04-17

Synthesizer learns library contracts that verify clients and pass tests

Verification Modulo Tested Library Contracts

Counterexample-guided CHC learning produces adequate modular and contextual contracts checked against library implementations

abstract click to expand

We consider the problem of verification modulo tested library contracts as a step towards automating the verification of client programs that use complex libraries. We formulate this problem as the synthesis of modular contracts for the library methods used by the client that are adequate to prove the client correct, and that also pass the scrutiny of a testing engine that tests the library against these contracts. We also consider a new form of method contracts called contextual contracts that arise in this setting that hold in the context of the client program, and can often be simpler and easier to infer than classical modular contracts. We provide a counterexample-guided learning framework to solve this problem, in which the synthesizer interacts with a constraint solver as well as the testing engine in order to infer adequate modular/contextual method contracts and inductive invariants for the client. The main synthesis engines we use are generalizing CHC solvers that are realized using ICE learning algorithms. We realize this framework in a tool called DUALIS and show its efficacy on benchmarks where clients call large libraries.

0

cs.PL 2026-04-17

Pure Borrow brings Rust-style borrowing to pure Linear Haskell

Pure Borrow: Linear Haskell Meets Rust-Style Borrowing

A library enables safe parallel mutation inside pure functions while retaining laziness and leak freedom.

abstract click to expand

A promising approach to unifying functional and imperative programming paradigms is to localize mutation using linear or affine types. Haskell, a purely functional language, was recently extended with linear types by Bernardy et al., in the name of Linear Haskell. However, it remained unknown whether such a pure language could safely support non-local borrowing in the style of Rust, where each borrower can be freely split and dropped without direct communication of ownership back to the lender. We answer this question affirmatively with Pure Borrow, a novel framework that realizes Rust-style borrowing in Linear Haskell with purity. Notably, it features parallel state mutation with affine mutable references inside pure computation, unlike the IO and ST monads and existing Linear Haskell APIs. It also enjoys purity, lazy evaluation, first-class polymorphism and leak freedom, unlike Rust. We implement Pure Borrow simply as a library in Linear Haskell and demonstrate its power with a case study in parallel computing. We formalize the core of Pure Borrow and build a metatheory that works toward establishing safety, leak freedom and confluence, with a new, history-based model of borrowing.

0

cs.PL 2026-04-17

Prism introduces a symbolic superoptimizer for tensor programs using a hierarchical…

Prism: Symbolic Superoptimization of Tensor Programs

Prism is the first symbolic superoptimizer for tensor programs that uses sGraph for compact representation of program families, two-level…

abstract click to expand

This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.

0

cs.PL 2026-04-17

Concatenative language teaches recursion and Gödel-encoding in 90 minutes

What if we have 90 minutes only to teach programming?

CON-CAT removes accidental complexity so novices reach core computing ideas through direct puzzles.

abstract click to expand

Programming is about automation in a wide variety of domains. Developing itself is one of those. As a side-effect, progress in automated coding may make people less willing to learn computer programming. This could become an issue, if the skill of computational problem solving is not only for the immediate economic benefit, but an important part of our knowledge about the world. We suggest that weakened incentives can be countered by lowering the entry barrier. We plan to shorten learning time by reducing the accidental complexity of the programming language and its runtime system. We describe a session plan that introduces programming and computing fundamentals for novices, assuming only basic mathematical background. This requires a non-mainstream, functional and concatenative language. This language, CON-CAT, is a by-product of research in category theory. It provides direct access to fundamental ideas like recursion and advanced ones like G\"odel-encoding in an entertaining puzzle-like manner.

0

cs.PL 2026-04-17

Tensor compiler auto-generates FlashAttention kernels from math specs

Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels

Nautilus discovers efficient tiled GPU code automatically, delivering up to 42% higher throughput than prior compilers on modern hardware

abstract click to expand

We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like kernels, offloading the entire burden of optimization from the programmer to the compiler. Across five transformer-based models and 150 evaluation configurations on NVIDIA GH200 and RTX 5090 GPUs, Nautilus achieves up to 23% higher throughput than state-of-the-art compilers on GH200 and up to 42% on RTX 5090, while matching or exceeding manually written cuDNN kernels on many long-sequence configurations.

0

cs.PL 2026-04-16

Filament adds information flow control to Rust without compiler changes

Filament: Denning-Style Information Flow Control for Rust

Macros and type inference track explicit and implicit flows with low overhead and fewer restrictions than prior library approaches.

abstract click to expand

Existing language-based information-flow control (IFC) tools face a fundamental tension: Denning-style systems that track explicit and implicit flows at the variable level typically require compiler modifications, while more coarse-grained approaches, including recent work Cocoon, avoid compiler changes but impose more restrictive programming models. We present Filament, a Denning-style static IFC library for Rust that requires no compiler modifications. Filament addresses three key challenges in building a practical IFC library for Rust. First, it enables fine-grained explicit-flow checking with minimal annotation overhead by leveraging Rust's type inference. Second, it introduces pc_block!, a lightweight construct for enforcing implicit flows via a compile-time program counter label, without requiring compiler support. Third, it provides fcall! and mcall! macros to support seamless and safe interoperability with standard and third-party libraries. Our evaluation shows that Filament incurs negligible compile-time overhead and requires only modest annotations. Moreover, compared to Cocoon, Filament offers a more permissive programming model, reducing the need for frequent escape hatches that bypass security checks.

0

cs.PL 2026-04-16

Persistent iterators snapshot containers to deliver value semantics

Persistent Iterators with Value Semantics

Local copies of each snapshot eliminate invalidation and aliasing while preserving ordinary iterator-based code style.

abstract click to expand

Iterators are a fundamental programming abstraction for traversing and modifying elements in containers in mainstream imperative languages such as C++. Iterators provide a uniform access mechanism that hides low-level implementation details of the underlying data structure. However, iterators over mutable containers suffer from well-known hazards including invalidation, aliasing, data races, and subtle side effects. Immutable data structures, as used in functional programming languages, avoid the pitfalls of mutation but rely on a very different programming model based on recursion and higher-order combinators rather than iteration. However, these combinators are not always well-suited to expressing certain algorithms, and recursion can expose implementation details of the underlying data structure. In this paper, we propose persistent iterators -- a new abstraction that reconciles the familiar iterator-based programming style of imperative languages with the semantics of persistent data structures. A persistent iterator snapshots the version of its underlying container at creation, ensuring safety against invalidation and aliasing. Iterator operations operate on the iterator-local copy of the container, giving true value semantics: variables can be rebound to new persistent values while previous versions remain accessible. We implement our approach in the form of LibFPP -- a C++ container library providing persistent vectors, maps, sets, strings, and other abstractions as persistent counterparts to the Standard Template Library (STL). Our evaluation shows that LibFPP retains the expressiveness of iterator-based programming, eliminates iterator-invalidation, and achieves asymptotic complexities comparable to STL implementations. Our design targets use cases where persistence and safety are desired, while allowing developers to retain familiar iterator-based programming patterns.

0

cs.PL 2026-04-16

Weighted NetKAT automates quantitative network safety and reachability checks

Weighted NetKAT: A Programming Language For Quantitative Network Verification

Semiring-parametric language and automata give automatic decision procedures even when programs contain unbounded iteration.

abstract click to expand

We introduce weighted NetKAT, a domain-specific language for modeling and verifying quantitative network properties. The language is parametric on a semiring, enabling the treatment of a wide range of quantities in a uniform way. We provide a denotational semantics and an equivalent operational semantics, the latter based on a novel model of weighted NetKAT automata (WNKA) capturing the stateful behavior of our language. With WNKA, we obtain a class of generic decision procedures for reasoning about quantitative safety and reachability in a fully automatic way, even in the presence of possibly unbounded iteration. We demonstrate the applicability of our framework in a case study using Internet2's Abilene network as the underlying topology.

0

cs.PL 2026-04-16

Precise compiler remarks boost AI code optimization success 3.3x

AI Coding Agents Need Better Compiler Remarks

Ambiguous feedback triggers hallucinations while structured analysis unlocks small models for safe refactoring.

abstract click to expand

Modern AI agents optimize programs by refactoring source code to trigger trusted compiler transformations. This preserves program semantics and reduces source code pollution, making the program easier to maintain and portable across architectures. However, this collaborative workflow is limited by legacy compiler interfaces, which obscure analysis behind unstructured, lossy optimization remarks that have been designed for human intuition rather than machine logic. Using the TSVC benchmark, we evaluate the efficacy of existing optimization feedback. We find that while precise remarks provide actionable feedback (3.3x success rate), ambiguous remarks are actively detrimental, triggering semantic-breaking hallucinations. By replacing ambiguous remarks with precise ones, we show that structured, precise analysis information unlocks the capabilities of small models, proving that the bottleneck is the interface, not the agent. We conclude that future compilers must expose structured, actionable feedback designed specifically for the future of autonomous performance engineering.

0

cs.PL 2026-04-16

Reachability undecidable under Release/Acquire without RMWs

On the Decidability of Verification under Release/Acquire

The result settles a seven-year open question and shows decidability returns only when both context switches and RMWs are bounded.

abstract click to expand

The verification of concurrent programs under weak-memory models is a burgeoning effort, owing to the increasing adoption of weak memory in concurrent software and hardware. Release/Acquire has become the standard model for high-performance concurrent programming, adopted by common mainstream languages and computer architectures. In a surprising result, Abdulla et al. (PLDI'19) proved that reachability in this model is undecidable when programs have access to atomic Read-Modify-Write (RMW) operations. Moreover, undecidability holds even for executions with just 4 contexts, and is thus immune to underapproximations based on bounded context switching. The canonical, RMW-free case was left as a challenging question, proving a non-primitive recursive lower bound as a first step, and has remained open for the past seven years. In this paper, we settle the above open question by proving that reachability is undecidable even in the RMW-free fragment of Release/Acquire, thereby characterizing the simplest set of communication primitives that gives rise to undecidability. Moreover, we prove that bounding both the number of context switches and the number of RMWs recovers decidability, thus fully characterizing the boundary of decidability along the dimensions of RMW-bounding and context-bounding.

0

cs.PL 2026-04-16

Erlang obfuscation exploits gaps between semantics and BEAM execution

Erlang Binary and Source Code Obfuscation

Targeted transformations preserve runtime behavior while blocking standard decompilers and reverse engineering.

abstract click to expand

This paper studies obfuscation techniques for Erlang programs at the source, abstract syntax tree, BEAM assembly, and BEAM bytecode levels. We focus on transformations that complicate reverse engineering, decompilation, and recompilation while remaining grounded in the actual behavior of the Erlang compiler, validator, loader, and virtual machine. The paper categorizes opcode-level dependency tricks, receive-based loop encodings, irregular control-flow constructions, mutability-oriented performance obfuscation, and self-modifying code enabled by dynamic module loading. A recurring theme is that effective obfuscation in BEAM often arises not from arbitrary corruption, but from exploiting representational gaps between high-level Erlang semantics and the lower-level execution model accepted by the toolchain and runtime.

0

cs.PL 2026-04-16

Cerisier gives the first logic for modular attestation proofs on capability machines

Cerisier: A Program Logic for Attestation in a Capability Machine

It supplies a universal contract that lets developers prove end-to-end security for systems mixing trusted enclaves, untrusted code, and att

abstract click to expand

A key feature in trusted computing is attestation, which allows encapsulated components (enclaves) to prove their identity to (local or remote) distrusting components. Reasoning about software that uses the technique requires tracking how trust evolves after successful attestation. This process is security-critical and non-trivial, but no existing formal verification technique supports modular reasoning about attestation of enclaves and their clients, or proving end-to-end properties for systems combining trusted, untrusted and attested code. We contribute Cerisier, the first program logic for modular reasoning about trusted, untrusted and attested code, fully mechanized in the Iris separation logic and the Rocq Prover. We formalize a recent proposal, CHERI-TrEE, to extend capability machines with enclave primitives, as an extension to the Cerise capability machine and program logic. Our program logic comes with a universal contract for untrusted code, which captures both capability safety and local enclave attestation. Like Cerise, this universal contract is phrased in terms of a logical relation defining capabilities' authority. We demonstrate Cerisier by proving end-to-end properties for three representative applications of trusted computing: secure outsourced computation, mutual attestation and a modeled trusted sensor component.

1 0

0

cs.PL 2026-04-15

Bisimulation synthesis finds optimal filters before UDF folds

Optimal Predicate Pushdown Synthesis

Framework produces strongest pre-filters and weakest post-filters for stateful computations, delivering 2.4× average speedup across 150 real

abstract click to expand

Predicate pushdown is a long-standing performance optimization that filters data as early as possible in a computational workflow. In modern data pipelines, this transformation is especially important because much of the computation occurs inside user-defined functions (UDFs) written in general-purpose languages such as Python and Scala. These UDFs capture rich domain logic and complex aggregations and are among the most expensive operations in a pipeline. Moving filters ahead of such UDFs can yield substantial performance gains, but doing so requires semantic reasoning. This paper introduces a general semantic foundation for predicate pushdown over stateful fold-based computations. We view pushdown as a correspondence between two programs that process different subsets of input data, with correctness witnessed by a bisimulation invariant relating their internal states. Building on this foundation, we develop a sound and relatively complete framework for verification, alongside a synthesis algorithm that automatically constructs optimal pushdown decompositions by finding the strongest admissible pre-filters and weakest residual post-filters. We implement this approach in a tool called Pusharoo and evaluate it on 150 real-world pandas and Spark data-processing pipelines. Our evaluation shows that Pusharoo is significantly more expressive than prior work, producing optimal pushdown transformations with a median synthesis time of 1.6 seconds per benchmark. Furthermore, our experiments demonstrate that the discovered pushdown optimizations speed up end-to-end execution by an average of 2.4$\times$ and up to two orders of magnitude.

0

cs.PL 2026-04-15

Offline presynthesis lets finer abstract semantics speed up synthesis

Presynthesis: Towards Scaling Up Program Synthesis with Finer-Grained Abstract Semantics

A tree automaton and oracle built once enable fast, sound pruning across SQL, string, and matrix tasks.

abstract click to expand

Abstract semantics has proven to be instrumental for accelerating search-based program synthesis, by enabling the sound pruning of a set of incorrect programs (without enumerating them). One may expect faster synthesis with increasingly finer-grained abstract semantics. Unfortunately, to the best of our knowledge, this is not the case, yet. The reason is because, as abstraction granularity increases -- while fewer programs are enumerated -- pruning becomes more costly. This imposes a fundamental limit on the overall synthesis performance, which we aim to address in this work. Our key idea is to introduce an offline presynthesis phase, which consists of two steps. Given a DSL with abstract semantics, the first semantics modeling step constructs a tree automaton A for a space of inputs -- such that, for any program P and for any considered input I, A has a run that corresponds to P's execution on I under abstract semantics. Then, the second step builds an oracle O for A. This O enables fast pruning during synthesis, by allowing us to efficiently find exactly those DSL programs that satisfy a given input-output example under abstract semantics. We have implemented this presynthesis-based synthesis paradigm in a framework, Foresighter. On top of it, we have developed three instantiations for SQL, string transformation, and matrix manipulation. All of them significantly outperform prior work in the respective domains.

0

cs.PL 2026-04-15

Virtual machine speeds array programs 147x on GPUs

Towards a Linear-Algebraic Hypervisor

A new parallel VM maps general-purpose code to concurrent GPU runs, addressing underused hardware in synthesis and optimization.

abstract click to expand

Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are traditionally underutilized by such workloads. Motivated by this opportunity, we introduce a pleasingly parallel virtual machine and benchmark its performance by evaluating millions of concurrent array programs, observing speedups up to $147\times$ relative to serial evaluation.

0

cs.PL 2026-04-15

New logic verifies hyperproperties with quantifier alternation on heaps

Hyper Separation Logic (extended version)

It lifts separating conjunction to sets of states so a frame rule stays sound for mixed quantifiers in imperative code.

abstract click to expand

Many important functional and security properties--including non-interference, determinism, and generalized non-interference (GNI)--are hyperproperties, i.e., properties relating multiple executions of a program. Existing separation logics allow one to reason about specific classes of hyperproperties, e.g., $\forall\forall$-hyperproperties such as non-interference and $\exists\exists$-properties such as non-determinism. However, they do not support quantifier alternation, which is for instance needed to express GNI. The only existing logic that can reason about such properties is Hyper Hoare Logic, but it does not support heap-manipulating programs and, thus, is not applicable to common imperative programs. This paper introduces Hyper Separation Logic (HSL), the first program logic that supports modular reasoning about hyperproperties with arbitrary quantifier alternation over programs that manipulate the heap. HSL generalizes Hyper Hoare Logic with a novel hyper separating conjunction that lifts the standard separating conjunction to sets of states, enabling a generalized frame rule for hyperproperties. We prove HSL sound in Isabelle/HOL and demonstrate its expressiveness for hyperproperties that lie beyond the reach of existing separation logics.

0

cs.PL 2026-04-15

Separation logic verifies differential privacy in higher-order code

Modular Verification of Differential Privacy in Probabilistic Higher-Order Separation Logic (Extended Version)

Privacy budgets are treated as resources so that complex libraries with filters, caching, and interactive algorithms can be checked modularl

abstract click to expand

Differential privacy is the standard method for privacy-preserving data analysis. The importance of having strong guarantees on the reliability of implementations of differentially private algorithms is widely recognized and has sparked fruitful research on formal methods. However, the design patterns and language features used in modern DP libraries as well as the classes of guarantees that the library designers wish to establish often fall outside of the scope of previous verification approaches. We introduce a program logic suitable for verifying differentially private implementations written in complex, general-purpose programming languages. Our logic has first-class support for reasoning about privacy budgets as a separation logic resource. The expressiveness of the logic and the target language allow our approach to handle common programming patterns used in the implementation of libraries for differential privacy, such as privacy filters and caching. While previous work has focused on developing guarantees for programs written in domain-specific languages or for privacy mechanisms in isolation, our logic can reason modularly about primitives, higher-order combinators, and interactive algorithms. We demonstrate the applicability of our approach by implementing a verified library of differential privacy mechanisms, including an online version of the Sparse Vector Technique, as well as a privacy filter inspired by the popular Python library OpenDP, which crucially relies on our ability to handle the combination of randomization, local state, and higher-order functions. We demonstrate that our specifications are general and reusable by instantiating them to verify clients of our library. All of our results have been foundationally verified in the Rocq Prover.

0

cs.PL 2026-04-14

Higher-order polyregular equivalence is undecidable

Polyregular equivalence is undecidable in higher-order types

Reduction from tiling shows no algorithm can check if two such functions match

abstract click to expand

It is open whether equivalence ( f = g ) is decidable for string-to-string polyregular functions. We consider their higher-order extension based on the {\lambda}-calculus definition of polyregular functions from Boja\'nczyk (2018). In this setting, equivalence is undecidable by reduction from the tiling problem.

0

cs.PL 2026-04-14

Typed lambda calculus gives formal semantics to LLM agents

λ_A: A Typed Lambda Calculus for LLM Agent Composition

Extends simply-typed lambda calculus with oracle calls and bounded fixpoints, proves safety, and shows 94 percent of real GitHub configs are

abstract click to expand

Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $\lambda_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop), probabilistic choice, and mutable environments. We prove type safety, termination of bounded fixpoints, and soundness of derived lint rules, with full Coq mechanization (1,519 lines, 42 theorems, 0 Admitted). As a practical application, we derive a lint tool that detects structural configuration errors directly from the operational semantics. An evaluation on 835 real-world GitHub agent configurations shows that 94.1% are structurally incomplete under $\lambda_A$, with YAML-only lint precision at 54%, rising to 96--100% under joint YAML+Python AST analysis on 175 samples. This gap quantifies, for the first time, the degree of semantic entanglement between declarative configuration and imperative code in the agent ecosystem. We further show that five mainstream paradigms (LangGraph, CrewAI, AutoGen, OpenAI SDK, Dify) embed as typed $\lambda_A$ fragments, establishing $\lambda_A$ as a unifying calculus for LLM agent composition.

1 0

0

cs.PL 2026-04-14

AtomSanitizer checks atomicity violations in quadratic time

Fast Atomicity Monitoring

Streaming algorithm runs faster than cubic predecessors and adds low overhead inside TSAN.

abstract click to expand

Atomicity is a fundamental abstraction in concurrency, specifying that program behavior can be understood by considering specific code blocks executing atomically. However, atomicity invariants are tricky to maintain while also optimizing for code efficiency, and atomicity violations are a common root cause of many concurrency bugs. To address this problem, several dynamic techniques have been developed for testing whether a program execution adheres to an atomicity specification, most often instantiated as \emph{conflict serializability}. The efficiency of the analysis has been targeted in various papers, with the state-of-the-art algorithms \textsc{RegionTrack} and \textsc{Aerodrome} achieving a time complexity $O(nk^3)$ and $O(nk(k + v + \ell))$, respectively, for a trace $\sigma$ of $n$ events, $k$ threads, $v$ locations, and $\ell$ locks. In this paper we introduce \textsc{AtomSanitizer}, a new algorithm for testing conflict serializability, with time complexity $O(nk^2)$. \textsc{AtomSanitizer} operates in an efficient streaming style, is theoretically faster than all existing algorithms, and also has a smaller memory footprint. Moreover, \textsc{AtomSanitizer} is the first algorithm designed to incur minimal locking when deployed in a concurrent monitoring setting. Experiments on standard benchmarks indicate that \textsc{AtomSanitizer} is always faster in practice than all existing conflict-serializability testers. Finally, we also implement \textsc{AtomSanitizer} inside the TSAN framework, for monitoring atomicity in real time. Our experiments reveal that \textsc{AtomSanitizer} incurs minimal time and space overhead compared to the data-race detection engine of TSAN, and thus is the first algorithm for conflict serializability demonstrated to be suitable for a runtime monitoring setting.

0

cs.PL 2026-04-14

A Categorical Basis for Robust Program Analysis

Lifting from restricted models yields robust analyses for full programs that behave predictably under changes.

abstract click to expand

Users of program analyses expect that results change predictably in response to changes in their programs, but many analyses fail to provide such robustness. This paper introduces a theoretical framework that provides a unified language to articulate robustness properties. By modeling programs and their properties as objects in a category, diverse notions of robustness-from variable renaming to semantic refinement and structural transformation-can be characterized as structure-preserving functors. Beyond formulating the meaning of robustness, this paper provides methods for achieving it. The first is a general recipe for designing robust analyses, by lifting a sound and robust analysis from a restricted (sub-Turing) model of computation to a sound and robust analysis for general programs. This recipe demystifies the design of several existing loop summarization and termination analyses by showing they are instantiations of this general recipe, and furthermore elucidates their robustness properties. The second is a characterization of a sense in which an algebraic program analysis is robust, provided that it is comprised of robust operators. In particular, we show that such analyses behave predictably under common refactoring patterns, such as variable renaming and loop unrolling.

0

cs.PL 2026-04-14

Function merging with branch reordering is FPT in b and d

Parameterized Algorithms and Complexity for Function Merging with Branch Reordering

Two algorithms show the problem stays tractable even with unbounded nesting in one function; hardness holds for constant d1, b2, d2.

abstract click to expand

Binary size reduction is an increasingly important optimization objective for compilers. One emerging technique is function merging, where multiple similar functions are merged into one, thereby eliminating redundancy. The SOTA approach to perform the merging is based on sequence alignment, where functions are viewed as linear sequences of instructions that are then matched in a way maximizing their alignment. In this paper, we consider a significantly generalized formulation of the problem by allowing reordering of branches within each function, subsequently allowing for more flexible matching and better merging. We show that this makes the problem NP-hard, and thus we study it through the lens of parameterized algorithms and complexity, where we identify certain parameters of the input that govern its complexity. We look at two natural parameters: the branching factor and nesting depth of input functions. Concretely, our input consists of two functions $F_1, F_2,$ where each $F_i$ has size $n_i,$ branching factor $b_i,$ and nesting depth $d_i.$ Our task is to reorder the branches of $F_1$ and $F_2$ in a way that yields linearizations achieving the maximum sequence alignment. Let $n=\max(n_1, n_2),$ and define $b, d$ similarly. Our results are as follows: - A simple algorithm running in time $2^{O(bd)} n^2,$ establishing that the problem is fixed-parameter tractable (FPT) with respect to all four parameters $b_1,d_1, b_2, d_2.$ - An algorithm running in time $2^{O(bd_2)} n^7,$ showing that even when one of the functions has an unbounded nesting depth, the problem remains in FPT. - A hardness result showing that the problem is NP-hard even when constrained to constant $d_1, b_2, d_2.$ To the best of our knowledge, this is the first systematic study of function merging with branch reordering from an algorithmic or complexity-theoretic perspective.

0

cs.PL 2026-04-14

Self-emulation needs full guest state

Emulation-Completeness of Programming Languages

Turing-completeness leaves out control flow, exceptions, timing, and metadata that real programs observe.

abstract click to expand

We study when a programming language can emulate programs written in that same language without delegating the guest program back to the host evaluator or compiler. We call this property emulation-completeness. The central observation is that Turing-completeness by itself is not enough: a self-emulator must not only compute the guest program's result, but must also account for the guest-visible state on which realistic programs depend, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers. This paper is a systematization paper. Its contribution is not a new emulator implementation, but a precise vocabulary and a structured taxonomy for reasoning about self-emulation. We distinguish source-level evaluation from compiled-code emulation, define syntactic and compiled-code emulation-completeness, and separate weak from strong emulation-completeness according to how much observable runtime behavior must be preserved. We then organize the requirements into two classes: language-side requirements, which determine whether the guest semantics can be represented explicitly inside the language, and emulator-side requirements, which determine whether the resulting emulator can faithfully mask or reproduce relevant observations. The discussion is grounded by concrete examples, including publicly documented details from Erlang, where argument limits, bitstring pattern matching, and message reception expose subtle mismatches between direct execution and self-emulation. The resulting framework is intended as guidance for language designers, implementers of evaluators and emulators, and researchers interested in secure sandboxing, decompilation, and reflective execution.

0

cs.PL 2026-04-13

Graded effects give denotational semantics for async multiparty sessions

Denotational reasoning for asynchronous multiparty session types

The model handles non-blocking sends and message reordering while ensuring safety, deadlock-freedom and liveness.

abstract click to expand

We provide the first denotational semantics for asynchronous multiparty session types with precise asynchronous subtyping. Our semantics enables us to reason about asynchronous message-passing, in which message-sending is non-blocking. It enables us to prove the correctness of communication optimisations, in particular, those involving reordering of messages. Our development crucially relies on modelling message-passing as a computational effect. We apply grading, a paradigm for tracking computational effects, to asynchronous message-passing, demonstrating that multiparty session typing can be viewed as an instance of grading. We demonstrate the utility of our model by showing that it forms an adequate denotational semantics for a call-by-value asynchronous message-passing calculus, that ensures communication safety, deadlock-freedom and liveness in the presence of communication optimisations.

0

cs.PL 2026-04-13

Deduplication engine cuts pointer analysis memory up to 18x

Points-to Analysis Using MDE: A Multi-level Deduplication Engine for Repetitive Data and Operations

Recursive removal of over 90% redundant set operations yields 8x runtime gains that increase with program size.

abstract click to expand

Precise pointer analysis is a foundational component of many client analyses and optimizations. Scaling flow- and context-sensitive pointer analysis has been a long-standing challenge, suffering from combinatorial growth in both memory usage and runtime. Existing approaches address this primarily by reducing the amount of information tracked often, at the cost of precision and soundness. In our experience a significant proportion of this cost comes from propagation of duplicate data and low-level data structure operations being repeated a large number of times. Our measurements on SPEC benchmarks show that more than 90% of all set-union operations performed can be redundant. We present Multi-level Deduplication Engine (MDE), a mechanism that recursively augments the representation of data through de-duplication and the assignment of unique identifiers to values to eliminate redundancy. This allows MDE to trivialize many operations, and memoize operations enabling their future reuse. MDE's recursive structure allows it to represent de-duplicated values that themselves are constructed from other de-deuplicated values, capturing structural redundancy not easily possible with non-recursive techniques. We provide a full C++ implementation of MDE as a library and integrate it into an existing implementation of a flow- and context-sensitive pointer analysis. Evaluation on selected SPEC benchmarks shows a reduction up to 18.10x in peak memory usage and 8.15x in runtime. More notably, MDE exhibits an upward trend of effectiveness with the increase in benchmark size. Besides performance improvements, this work highlights the importance of representation design and suggests new opportunities for bringing efficiency to future analyses.

0

cs.PL 2026-04-13

Tcl objects from lists run 7-18x faster than TclOO

Vanilla Object Orientation (VOO): A Value-Semantics Approach to Classes in Tcl

Native lists and copy-on-write replace framework code, cutting creation time and memory use while keeping the same script API.

abstract click to expand

I present Vanilla Object Orientation (VOO), a framework that composes classes from Tcl's native data structures -- lists and dictionaries -- rather than introducing additional framework infrastructure. VOO objects are plain Tcl lists with automatic memory management through copy-on-write semantics, eliminating the destructor burden inherent in TclOO and Itcl. Benchmarks on Tcl 8.6.13 and Tcl 9.0 show VOO achieves 7--18x faster object creation and 4--6x superior memory efficiency compared to TclOO. A companion C++ migration path (VOO C++) further improves field-access speed (setter 2.3--2.6x faster) and memory (6.8--9.8x lighter than TclOO), while preserving an identical Tcl call-site API. Cross-version analysis confirms that VOO's compositional design scales better than framework-based approaches as the interpreter evolves.

0

cs.PL 2026-04-13

Free variables replace dominance in SSA for higher-order code

SSA without Dominance for Higher-Order Programs

Tracking dependencies directly through phi-functions and parameters improves precision and removes the need for control-flow graphs.

abstract click to expand

Dominance is a fundamental concept in compilers based on static single assignment (SSA) form. It underpins a wide range of analyses and transformations and defines a core property of SSA: every use must be dominated by its definition. We argue that this reliance on dominance has become increasingly problematic -- both in terms of precision and applicability to modern higher-order languages. First, control flow overapproximates data flow, which makes dominance-based analyses inherently imprecise. Second, dominance is well-defined only for first-order control-flow graphs (CFGs). More critically, higher-order programs violate the assumptions underlying SSA and classic CFGs: without an explicit CFG, the very notion that all uses of a variable must be dominated by its definition loses meaning. We propose an alternative foundation based on free variables. In this view, $\phi$-functions and function parameters directly express data dependencies, enabling analyses traditionally built on dominance while improving precision and naturally extending to higher-order programs. We further present an efficient technique for maintaining free-variable sets in a mutable intermediate representation (IR). For analyses requiring additional structure, we introduce the nesting tree -- a relaxed analogue of the dominator tree constructed from variable dependencies rather than control flow. Our benchmarks demonstrate that the algorithms and data structures presented in this paper scale log-linearly with program size in practice.

0

cs.PL 2026-04-13

LLM creates alias-free model for Petri-net concurrency verification

CIR+CVN: Bridging LLM Semantic Understanding and Petri-Net Verification for Concurrent Programs

Natural language requirements turn into explicit structures that enable exhaustive checking, counterexample repair, and preservation of crit

abstract click to expand

Recovering concurrency structure directly from source code is difficult because shared-resource identity and protection relations are often obscured by aliasing, ownership, and API-specific idioms. We therefore study a specification-driven, model-first verification architecture for LLM-assisted concurrent program construction. Instead of verifying arbitrary source code, a large language model first synthesizes a verification-oriented concurrency artifact from a natural-language requirement or system specification. The first formalism, the Concurrency Intermediate Representation (Cir), is a statement-level, alias-free model in which shared resources are globally named, protection relations are explicit, and each statement carries a stable identifier. The second formalism, the Concurrency Verification Net (Cvn), is a weighted place/transition Petri net with a finite global store and three-valued guards for data-dependent branching. A validated Cir artifact is translated mechanically to Cvn, explored exhaustively, and any counterexample is mapped back to statement identifiers to guide targeted repair. To reduce the risk of bug-free but behavior-dropping repairs, acceptance additionally applies a lightweight goal-reachability check over designated critical outcomes. We formalize both representations, prove translation-correspondence results for deadlock and signal-loss analysis, define a two-layer checking architecture with 61 static rules and 5 analysis predicates, and evaluate the pipeline on 9 representative bounded-concurrency patterns. The results show that the method supports iterative bug detection and repair on Cir artifacts and that goal reachability helps filter semantically incomplete repairs. The trust boundary of the present work is the generated Cir artifact rather than arbitrary source code.

0