A three-phase DRL framework for personalized portfolio management using a ticker-free encoder pretrained with a time series foundation model, an objective-conditioned MoE actor-critic, and inference-time LoRA adaptation from brokerage data.
hub
A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
13 Pith papers cite this work. Polarity classification is still indexing.
abstract
Financial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 50 days.
hub tools
citation-role summary
citation-polarity summary
years
2026 13roles
background 1polarities
background 1representative citing papers
Frontier AI models lose 16-31% trading on Kalshi over 57 days but show better results on Polymarket, with platform design strongly affecting outcomes and prediction accuracy mattering more than research volume.
Counterfactual transport flows enable conservative, instance-specific trajectory refinement in offline RL by constructing local preference pairs in latent space from offline data and learning refinement directions controlled by a strength parameter.
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
SBCA is a reinforcement learning framework using BERT cross-modal fusion and Actor-Critic to integrate price data with sentiment text for multi-asset portfolio optimization with practical trading constraints.
KICL completes execution decisions in KOL financial discourse using offline RL, achieving top returns and Sharpe ratios with no unsupported trades or direction changes on YouTube and X data from 2022-2025.
FPQC-SAC adds a bounded parameterized quantum circuit to SAC to constrain representations in low-SNR financial environments, reporting 66.89% higher cumulative returns than standard SAC on real portfolio tasks.
BAVAR-BLED combines BAVAR for regime-aware priors and BLED with Student's t-distributions inside TD3, reporting Sharpe 1.72 and Sortino 2.70 on 29 DJIA stocks over 10 years.
LLM agents (hawkish, dovish, debate) outperform a deterministic z-score rule agent in Sharpe ratio for commodity ETF portfolios by 0.04-0.044, with advantage concentrated in the soft-landing sub-period and preserved up to 30bp trading costs.
A hybrid DRL system for multi-pair crypto trading with deterministic risk shielding outperforms a heuristic baseline at 10% significance on Binance futures data.
ReCAP segments markets into regimes, builds a policy library via continual learning, and uses a regime-gate to adapt trading policies, claiming superior returns and fast adaptation on five real datasets.
A semi-supervised teacher-student framework enables neural networks to proxy CVaR portfolio optimization using synthetic data augmentation for scarce labels and regime shifts.
A systematic review of physics-informed neural networks and mathematical modeling approaches for portfolio optimization and management in finance.
citing papers explorer
-
A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management
A three-phase DRL framework for personalized portfolio management using a ticker-free encoder pretrained with a time series foundation model, an objective-conditioned MoE actor-critic, and inference-time LoRA adaptation from brokerage data.