Financial Trading as a Game: A Deep Reinforcement Learning Approach

Chien Yi Huang

arxiv: 1807.02787 · v1 · pith:XDRWPV4Gnew · submitted 2018-07-08 · 💱 q-fin.TR · cs.LG· stat.ML

Financial Trading as a Game: A Deep Reinforcement Learning Approach

Chien Yi Huang This is my paper

classification 💱 q-fin.TR cs.LGstat.ML

keywords learningfinancialtradingdeepmarketagentalgorithmreinforcement

0 comments

read the original abstract

An automatic program that generates constant profit from the financial market is lucrative for every market practitioner. Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent. In this paper, we propose an Markov Decision Process (MDP) model suitable for the financial trading task and solve it with the state-of-the-art deep recurrent Q-network (DRQN) algorithm. We propose several modifications to the existing learning algorithm to make it more suitable under the financial trading setting, namely 1. We employ a substantially small replay memory (only a few hundreds in size) compared to ones used in modern deep reinforcement learning algorithms (often millions in size.) 2. We develop an action augmentation technique to mitigate the need for random exploration by providing extra feedback signals for all actions to the agent. This enables us to use greedy policy over the course of learning and shows strong empirical performance compared to more commonly used epsilon-greedy exploration. However, this technique is specific to financial trading under a few market assumptions. 3. We sample a longer sequence for recurrent neural network training. A side product of this mechanism is that we can now train the agent for every T steps. This greatly reduces training time since the overall computation is down by a factor of T. We combine all of the above into a complete online learning algorithm and validate our approach on the spot foreign exchange market.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation
q-fin.CP 2025-02 unverdicted novelty 3.0

CausalGAN + SAC RL pipeline generates synthetic bond yield data; fine-tuned Qwen2.5-7B LLM produces trading signals, with reported MAE 0.103, 60% profit rate, and LLM score 3.37/5.