arXiv preprint arXiv:2011.09607v2 , year=

FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance , author= · 2011 · arXiv 2011.09607

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Robust Adversarial Policy Optimization Under Dynamics Uncertainty

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

RAPO uses a dual robust RL formulation with trajectory-level adversarial networks and model-level Boltzmann reweighting over dynamics ensembles to improve policy resilience and out-of-distribution generalization while keeping the problem tractable.

SBCA: Cross-Modal BERT-driven Actor-Critic for Multi-Asset Portfolio Optimization

q-fin.CP · 2026-05-02 · unverdicted · novelty 6.0

SBCA is a reinforcement learning framework using BERT cross-modal fusion and Actor-Critic to integrate price data with sentiment text for multi-asset portfolio optimization with practical trading constraints.

Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.

EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation

cs.AI · 2026-04-13 · unverdicted · novelty 4.0

EvoNash-MARL achieves 19.6% annualized returns on equity allocation from 2014-2024 versus 11.7% for SPY, with evidence of robustness under constraints but no strong statistical superiority per WRC and SPA-lite tests.

citing papers explorer

Showing 4 of 4 citing papers.

Robust Adversarial Policy Optimization Under Dynamics Uncertainty cs.LG · 2026-04-13 · unverdicted · none · ref 8
RAPO uses a dual robust RL formulation with trajectory-level adversarial networks and model-level Boltzmann reweighting over dynamics ensembles to improve policy resilience and out-of-distribution generalization while keeping the problem tractable.
SBCA: Cross-Modal BERT-driven Actor-Critic for Multi-Asset Portfolio Optimization q-fin.CP · 2026-05-02 · unverdicted · none · ref 11
SBCA is a reinforcement learning framework using BERT cross-modal fusion and Actor-Critic to integrate price data with sentiment text for multi-asset portfolio optimization with practical trading constraints.
Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics cs.LG · 2026-05-07 · unverdicted · none · ref 4
SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.
EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation cs.AI · 2026-04-13 · unverdicted · none · ref 8
EvoNash-MARL achieves 19.6% annualized returns on equity allocation from 2014-2024 versus 11.7% for SPY, with evidence of robustness under constraints but no strong statistical superiority per WRC and SPA-lite tests.

arXiv preprint arXiv:2011.09607v2 , year=

fields

years

verdicts

representative citing papers

citing papers explorer