Agent Guide: A Simple Agent Behavioral Watermarking Framework

Kaibo Huang; Linna Zhou; Zhongliang Yang; Zipei Zhang

arxiv: 2504.05871 · v3 · pith:XYRUMVXFnew · submitted 2025-04-08 · 💻 cs.AI

Agent Guide: A Simple Agent Behavioral Watermarking Framework

Kaibo Huang , Zipei Zhang , Zhongliang Yang , Linna Zhou This is my paper

classification 💻 cs.AI

keywords agentbehaviorwatermarkingagentsframeworkguideactionbehavioral

0 comments

read the original abstract

The increasing deployment of intelligent agents in digital ecosystems, such as social media platforms, has raised significant concerns about traceability and accountability, particularly in cybersecurity and digital content protection. Traditional large language model (LLM) watermarking techniques, which rely on token-level manipulations, are ill-suited for agents due to the challenges of behavior tokenization and information loss during behavior-to-action translation. To address these issues, we propose Agent Guide, a novel behavioral watermarking framework that embeds watermarks by guiding the agent's high-level decisions (behavior) through probability biases, while preserving the naturalness of specific executions (action). Our approach decouples agent behavior into two levels, behavior (e.g., choosing to bookmark) and action (e.g., bookmarking with specific tags), and applies watermark-guided biases to the behavior probability distribution. We employ a z-statistic-based statistical analysis to detect the watermark, ensuring reliable extraction over multiple rounds. Experiments in a social media scenario with diverse agent profiles demonstrate that Agent Guide achieves effective watermark detection with a low false positive rate. Our framework provides a practical and robust solution for agent watermarking, with applications in identifying malicious agents and protecting proprietary agent systems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sequential Behavioral Watermarking for LLM Agents
cs.CR 2026-05 unverdicted novelty 7.0

SeqWM embeds watermarks into history-conditioned action transitions in LLM agent trajectories and verifies them position-agnostically, achieving robust detection under perturbations where prior per-step methods fail.
Federated Stream-Processing and Latency-Gated Response for Cross-Sector Threat Detection and Collaborative Containment
cs.CR 2026-05 unverdicted novelty 4.0

A federated stream-processing system with PFDS, in-memory sharded workers, and statistical watermarking achieves end-to-end cross-sector threat detection and containment in 12-20 seconds on a 500k events/sec prototype...