AgentThread analyzes five agent protocols with formal TLA+ invariants and SDK tests, reporting 35 specification findings, 80 implementation tests, 30 composition-only failures, and a cross-protocol responsibility gap in security enforcement.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CrypFormBench is a new benchmark jointly covering symbolic and computational security to evaluate LLMs on five formal analysis capabilities, with results showing top model Claude-3.5 scores 48.7/100 and most models struggling on generation, transformation, and correction.
citing papers explorer
-
Formal Security Analysis of Agent Protocol Composition
AgentThread analyzes five agent protocols with formal TLA+ invariants and SDK tests, reporting 35 specification findings, 80 implementation tests, 30 composition-only failures, and a cross-protocol responsibility gap in security enforcement.
-
CrypFormBench: Benchmarking Formal Analysis Capability of Large Language Models for Cryptographic Schemes
CrypFormBench is a new benchmark jointly covering symbolic and computational security to evaluate LLMs on five formal analysis capabilities, with results showing top model Claude-3.5 scores 48.7/100 and most models struggling on generation, transformation, and correction.