FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
hub Canonical reference
Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
fields
cs.AI 8 cs.CE 3 cs.CL 2 cs.LG 2 cs.CR 1 cs.DC 1 cs.MA 1 physics.chem-ph 1 q-fin.PR 1 q-fin.ST 1roles
background 6polarities
background 6representative citing papers
AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.
Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.
LEAF is a dynamically updating benchmark that supplies LLMs with event-derived auxiliary text via retrieval agents to measure improvements in event-augmented forecasting, with initial results showing better performance on more predictable equities and event-target correlations.
ASR, a new trajectory-fidelity metric, detects that 10 of 18 LLMs skip confirmation steps in payment agents despite perfect scores on prior metrics, and ASR-guided refinements improve task success by up to 93.8 percentage points.
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
TokenCake introduces agent-aware temporal and spatial schedulers for KV cache management in LLM multi-agent serving, claiming over 47% lower end-to-end latency and up to 16.9% better GPU memory utilization than vLLM on representative benchmarks.
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
CoRT achieves 95% average attack success rate on nine LLMs by using iterative risk-concealing prompts and a controller that scores concealment levels on a new 522-instruction financial risk benchmark.
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
MadEvolve uses LLMs for evolutionary optimization of trading strategies and reports significant backtest improvements on Bitcoin tasks including signal feature evolution and joint strategy optimization.
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
AlphaQuanter introduces a single-agent tool-augmented RL framework for stock trading that learns dynamic policies over a transparent decision workflow and reports state-of-the-art financial metrics.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
FundaPod presents a multi-persona AI agent architecture with knowledge-graph memory to support human-adjudicated fundamental investment research through independent agent work and verifiable evidence links.
This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
citing papers explorer
-
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
-
AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection
AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.
-
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
-
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
-
A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective
This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.