CTA framework detects 522 skill influence patterns in LLM agent traces across 49 tasks where average pass rate shifts only +0.3%, exposing evaluation gaps in behavioral effects like template copying and excess planning.
hub
Autogen: Enabling next-gen llm applications via multi-agent conversations
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 13representative citing papers
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
AgentForesight trains a 7B model to perform online auditing of multi-agent LLM trajectories, detecting early decisive errors and outperforming larger models on custom and external benchmarks.
The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.
Agentic AI needs social theory as structural priors in the MASS framework to model emergent dynamics from multi-agent interactions.
citing papers explorer
-
Counterfactual Trace Auditing of LLM Agent Skills
CTA framework detects 522 skill influence patterns in LLM agent traces across 49 tasks where average pass rate shifts only +0.3%, exposing evaluation gaps in behavioral effects like template copying and excess planning.
-
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
-
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
-
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
AgentForesight trains a 7B model to perform online auditing of multi-agent LLM trajectories, detecting early decisive errors and outperforming larger models on custom and external benchmarks.
-
SoK: Blockchain Agent-to-Agent Payments
The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.
-
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
-
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
-
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization
HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.
-
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
-
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
-
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.
-
Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems
Agentic AI needs social theory as structural priors in the MASS framework to model emergent dynamics from multi-agent interactions.
- SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning