arxiv: 2605.01384 · v1 · submitted 2026-05-02 · 💱 q-fin.CP

Recognition: unknown

SBCA: Cross-Modal BERT-driven Actor-Critic for Multi-Asset Portfolio Optimization

Jiahao Chen, Jinfeng Pan

Pith reviewed 2026-05-10 15:26 UTC · model grok-4.3

classification 💱 q-fin.CP

keywords portfolio optimizationactor-criticBERTcross-modal fusionmulti-asset portfoliofinancial sentimentreinforcement learningquantitative trading

0 comments

The pith

SBCA fuses BERT text features with price data in an actor-critic model to outperform benchmarks in multi-asset portfolio optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SBCA, a reinforcement learning framework that uses BERT to process financial text alongside price time series for portfolio decisions. It employs a gated fusion to combine these modalities and adds penalties for downside risk and trading turnover in the reward signal. Tests on 11 years of U.S. multi-asset stock data demonstrate better portfolio value, returns, Sharpe ratios, and lower drawdowns than equal weighting or market benchmarks. Sympathetic readers would care because it shows how natural language processing can enhance quantitative trading beyond price-only models. Ablation confirms the fusion and actor-critic components add value.

Core claim

SBCA is a cross-modal BERT-driven Actor-Critic framework that adaptively integrates price time-series features and text semantic features via a gated fusion mechanism, incorporates downside risk and turnover constraints into the reward function, and achieves superior performance in portfolio optimization tasks as validated through extensive experiments on long-term U.S. stock datasets.

What carries the argument

Cross-modal gated fusion mechanism within a BERT-driven Actor-Critic reinforcement learning framework, which adaptively merges multi-modal financial inputs to generate actions under explicit risk and turnover constraints.

If this is right

SBCA generates higher portfolio values and annual returns than equal-weight, buy-and-hold, and market benchmarks.
The model delivers improved Sharpe ratios indicating better risk-adjusted performance.
Maximum drawdowns are reduced, offering better downside protection during market stress.
Performance holds under varying transaction costs, confirming cost robustness.
Ablation removing the fusion module or actor-critic structure degrades results, showing both are necessary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to real-time news feeds to respond faster to events outside the historical test window.
Similar fusion might incorporate additional data like earnings calls or regulatory filings for broader decision support.
Embedded constraints allow direct compliance with investor mandates without separate post-trade adjustments.
The method's end-to-end nature reduces reliance on separate prediction and optimization stages common in quant pipelines.

Load-bearing premise

The gated fusion of price time-series and BERT-derived text features produces meaningfully better portfolio actions than price data alone when risk and turnover penalties are applied.

What would settle it

Running the same 11-year experiments but replacing the cross-modal fusion with price-only inputs and finding no gain in Sharpe ratio or maximum drawdown would falsify the benefit of adding text features.

Figures

Figures reproduced from arXiv: 2605.01384 by Jiahao Chen, Jinfeng Pan.

**Figure 1.** Figure 1: SBCA Task Deep reinforcement learning (DRL) provides an effective paradigm for dynamic portfolio optimization with its advantages in sequential decision-making and dynamic environment adaptation. Meanwhile, pre-trained language models represented by BERT can extract valuable semantic and sentiment features from unstructured financial texts. However, existing research still suffers from several critical l… view at source ↗

**Figure 2.** Figure 2: SBCA Framework First, the framework starts with a temporal alignment module that bridges news data and stock trading days. Since news released after market closure only impacts the next trading session, this module stacks all news titles of a calendar day using " ||| " as a separator and aligns them to the subsequent trading day. This design strictly enforces causal information flow (no future data leakage… view at source ↗

**Figure 3.** Figure 3: Portfolio Value Comparison of Different Models [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: Training Step Curves of the SBCA Model the convergence and stability of SBCA during the training process. In both asset portfolios, the training loss of SBCA decreases steadily as the number of training steps increases, and gradually converges to a stable value, indicating that the model can effectively learn the market rules and portfolio optimization strategies through training. At the same time, the va… view at source ↗

read the original abstract

Portfolio optimization is constrained by linear assumptions and insufficient integration of multi-modal information in traditional models. This paper proposes a cross-modal BERT-driven Actor-Critic framework SBCA for multi-asset portfolio optimization to address the deficiencies of existing deep reinforcement learning DRL methods in fusing price data and financial text sentiment, as well as lacking practical trading constraints. The framework adopts a cross-modal gated fusion mechanism to adaptively integrate price time-series features and text semantic features, embeds downside risk and turnover penalty constraints into the reward function, and constructs a complete empirical system for validation. Experiments on 11-year U.S. stock multi-asset datasets show that SBCA outperforms equal weight, buy-and-hold and market benchmark strategies in portfolio value, annual return, Sharpe ratio and maximum drawdown. Ablation studies verify the complementary enhancement of Actor-Critic mechanism and cross-modal fusion module. Cost sensitivity analysis confirms the model's robustness under varying transaction costs. SBCA provides an effective and interpretable end-to-end solution for dynamic quantitative portfolio decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SBCA adds gated BERT fusion to actor-critic RL for portfolios with risk and turnover penalties in the reward, but the performance edge may rest on invalid weight outputs and thin experimental controls.

read the letter

The paper combines a BERT-driven cross-modal gated fusion inside an actor-critic setup for multi-asset portfolio decisions. It folds downside risk and turnover penalties straight into the reward and reports better portfolio value, annual return, Sharpe ratio, and max drawdown than equal-weight, buy-and-hold, and market benchmarks on 11 years of U.S. stock data. Ablation checks and cost-sensitivity runs are included as well.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes SBCA, a cross-modal BERT-driven Actor-Critic framework for multi-asset portfolio optimization. It employs a gated fusion mechanism to integrate price time-series features with BERT-derived text semantic features, embeds downside risk and turnover penalties directly into the reward function, and reports outperformance versus equal-weight, buy-and-hold, and market benchmarks on an 11-year U.S. stock multi-asset dataset in terms of portfolio value, annual return, Sharpe ratio, and maximum drawdown. Ablation studies are said to confirm the value of the Actor-Critic structure and fusion module, while cost-sensitivity tests demonstrate robustness.

Significance. If the empirical claims are supported by verifiable experimental controls and feasible action spaces, the work would offer a practical advance in computational finance by showing how multi-modal DRL can incorporate sentiment alongside price data while respecting trading frictions. The explicit embedding of risk and turnover penalties in the reward is a constructive step beyond purely return-maximizing DRL formulations.

major comments (3)

[Abstract and framework description] Abstract and framework description: the actor is stated to produce portfolio decisions under embedded risk and turnover penalties, yet no normalization (softmax, simplex projection, or post-processing) is described to enforce non-negative weights that sum to one. Because the central performance claims rest on dynamic allocations that are compared to feasible benchmarks, the absence of a hard feasibility step means the reported Sharpe, return, and drawdown advantages could be artifacts of invalid weight vectors that the soft penalties only discourage rather than forbid.
[Experiments section] Experiments section: the abstract asserts clear outperformance and robustness from ablation and cost-sensitivity tests, but supplies no information on data splits (train/validation/test periods), statistical significance testing of metric differences, baseline implementation details, or overfitting controls. These omissions leave the central performance claim unsupported by verifiable detail and prevent assessment of whether the 11-year results generalize.
[Ablation studies] Ablation studies: the claim that the Actor-Critic mechanism and cross-modal fusion module provide complementary enhancement is presented without quantitative isolation of the gated fusion parameters or controls for the additional degrees of freedom they introduce. This weakens the ability to attribute gains specifically to the invented cross-modal component rather than to increased model capacity.

minor comments (1)

[Title and abstract] The title and abstract introduce the acronym SBCA without an immediate parenthetical expansion, which reduces immediate readability for readers unfamiliar with the framework.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for improving clarity and rigor. We address each major comment point by point below and will revise the manuscript to incorporate the necessary additions and clarifications.

read point-by-point responses

Referee: [Abstract and framework description] Abstract and framework description: the actor is stated to produce portfolio decisions under embedded risk and turnover penalties, yet no normalization (softmax, simplex projection, or post-processing) is described to enforce non-negative weights that sum to one. Because the central performance claims rest on dynamic allocations that are compared to feasible benchmarks, the absence of a hard feasibility step means the reported Sharpe, return, and drawdown advantages could be artifacts of invalid weight vectors that the soft penalties only discourage rather than forbid.

Authors: We agree that the framework description in the original manuscript does not explicitly detail the normalization procedure for the actor outputs. In our implementation the actor applies a softmax activation to produce weights that are non-negative and sum to one; this hard constraint operates alongside the soft risk and turnover penalties in the reward. We will revise the relevant sections (including the framework description and, space permitting, the abstract) to clearly state this normalization step and confirm that all reported allocations satisfy the simplex constraint. revision: yes
Referee: [Experiments section] Experiments section: the abstract asserts clear outperformance and robustness from ablation and cost-sensitivity tests, but supplies no information on data splits (train/validation/test periods), statistical significance testing of metric differences, baseline implementation details, or overfitting controls. These omissions leave the central performance claim unsupported by verifiable detail and prevent assessment of whether the 11-year results generalize.

Authors: We acknowledge that the experiments section lacks these essential details. In the revised manuscript we will specify the exact temporal train/validation/test splits, report statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) for the reported metric differences, provide additional implementation details for the baselines, and describe the overfitting controls employed (regularization, early stopping, and any cross-validation procedures). revision: yes
Referee: [Ablation studies] Ablation studies: the claim that the Actor-Critic mechanism and cross-modal fusion module provide complementary enhancement is presented without quantitative isolation of the gated fusion parameters or controls for the additional degrees of freedom they introduce. This weakens the ability to attribute gains specifically to the invented cross-modal component rather than to increased model capacity.

Authors: We accept that the ablation analysis would be strengthened by explicit controls for model capacity and isolation of the fusion mechanism. We will revise the ablation section to report parameter counts for each variant, provide quantitative values for the learned gated fusion parameters, and include an additional controlled comparison that matches capacity while varying only the cross-modal component. revision: yes

Circularity Check

0 steps flagged

No circularity: SBCA is an empirical proposal validated by experiments

full rationale

The paper introduces SBCA as a practical Actor-Critic architecture that fuses BERT text features with price time-series via gated cross-modal fusion and embeds risk/turnover penalties directly into the reward. All central claims rest on reported outperformance versus equal-weight, buy-and-hold and market benchmarks on an 11-year multi-asset dataset, plus ablation and cost-sensitivity checks. No derivation chain exists that reduces a claimed result to a fitted parameter or self-citation by construction; the framework is presented as an end-to-end trainable system whose merit is external to its own definitions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Abstract-only review yields limited technical detail; the framework rests on standard reinforcement-learning assumptions and introduces one novel component whose parameters are learned from data.

free parameters (1)

gated fusion parameters
The adaptive cross-modal fusion mechanism necessarily contains learnable weights that are fitted during training on the price and text data.

axioms (1)

domain assumption Actor-Critic reinforcement learning is suitable for sequential multi-asset portfolio decisions under uncertainty
This is the foundational modeling choice that enables the entire framework.

invented entities (1)

cross-modal gated fusion mechanism no independent evidence
purpose: Adaptively integrate price time-series features and text semantic features
Introduced as the core technical innovation to address insufficient multi-modal fusion in existing DRL methods.

pith-pipeline@v0.9.0 · 5476 in / 1459 out tokens · 42329 ms · 2026-05-10T15:26:56.493734+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 11 canonical work pages

[1]

Jiang, D

Deep portfolio management: A deep reinforcement learning framework for the financial portfolio management problem , author=. arXiv preprint arXiv:1706.10059 , year=

work page arXiv
[2]

arXiv preprint arXiv:1901.08740 , year=

Model-based deep reinforcement learning for dynamic portfolio optimization , author=. arXiv preprint arXiv:1901.08740 , year=

work page arXiv 1901
[3]

IEEE Transactions on Knowledge and Data Engineering , year=

Cost-sensitive portfolio selection via deep reinforcement learning , author=. IEEE Transactions on Knowledge and Data Engineering , year=
[4]

Neural Computing and Applications , volume=

Dynamic portfolio rebalancing through reinforcement learning , author=. Neural Computing and Applications , volume=
[5]

Annals of Operations Research , year=

A reinforcement learning approach to dynamic portfolio optimization , author=. Annals of Operations Research , year=
[6]

arXiv preprint arXiv:2405.01604 , year=

Portfolio management using deep reinforcement learning , author=. arXiv preprint arXiv:2405.01604 , year=

work page arXiv
[7]

arXiv preprint arXiv:2511.20678 , year=

Reinforcement learning-based cryptocurrency portfolio management using soft actor--critic and deep deterministic policy gradient algorithms , author=. arXiv preprint arXiv:2511.20678 , year=

work page arXiv
[8]

International Journal of Computational Intelligence Systems , volume=

Risk-adjusted deep reinforcement learning for portfolio optimization: A multi-reward approach , author=. International Journal of Computational Intelligence Systems , volume=
[9]

arXiv preprint arXiv:2602.17098 , year=

Deep reinforcement learning for optimal portfolio allocation: A comparative study with mean-variance optimization , author=. arXiv preprint arXiv:2602.17098 , year=

work page arXiv
[10]

Proceedings of the 3rd ACM International Conference on AI in Finance , pages=

Finrl: Deep reinforcement learning framework to automate trading in quantitative finance , author=. Proceedings of the 3rd ACM International Conference on AI in Finance , pages=
[11]

arXiv preprint arXiv:2011.09607v2 , year=

FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance , author=. arXiv preprint arXiv:2011.09607v2 , year=

work page arXiv 2011
[12]

Advances in Neural Information Processing Systems , volume=

FinRL-meta: Market environments and benchmarks for data-driven financial reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=
[13]

Finbert: Financial sentiment analysis with pre-trained language models

Finbert: Financial sentiment analysis with pre-trained language models , author=. arXiv preprint arXiv:1908.10063 , year=

work page arXiv 1908
[14]

, author Uy, M.C.S

Finbert: A pretrained language model for financial communications , author=. arXiv preprint arXiv:2006.08097 , year=

work page arXiv 2006
[15]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , pages=

Finbert: A pre-trained financial language representation model for financial text mining , author=. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , pages=
[16]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

2019
[17]

arXiv preprint arXiv:2107.08721 , year=

Stock movement prediction with financial news using contextualized embedding from BERT , author=. arXiv preprint arXiv:2107.08721 , year=

work page arXiv
[18]

International Conference on Knowledge Discovery and Information Retrieval , pages=

Stock trend prediction using financial market news and BERT , author=. International Conference on Knowledge Discovery and Information Retrieval , pages=
[19]

Neural Computing and Applications , volume=

Applying BERT to analyze investor sentiment in stock market , author=. Neural Computing and Applications , volume=
[20]

arXiv preprint arXiv:2410.01987 , year=

Financial sentiment analysis on news and reports using large language models and FinBERT , author=. arXiv preprint arXiv:2410.01987 , year=

work page arXiv
[21]

Computational Economics , volume=

Enhancing sentiment analysis in stock market tweets through BERT-based knowledge transfer , author=. Computational Economics , volume=
[22]

arXiv preprint arXiv:2412.17293 , year=

Multimodal deep reinforcement learning for portfolio optimization , author=. arXiv preprint arXiv:2412.17293 , year=

work page arXiv
[23]

European Conference on Artificial Intelligence , journal=

Cross-modal temporal fusion for financial market forecasting , author=. European Conference on Artificial Intelligence , journal=
[24]

Proceedings of the 6th ACM International Conference on AI in Finance , pages=

Modality-aware transformer for financial time series forecasting , author=. Proceedings of the 6th ACM International Conference on AI in Finance , pages=
[25]

IEEE Access , volume=

Sentiment-aware portfolio optimization: CVaR-based diversification with deep reinforcement learning , author=. IEEE Access , volume=
[26]

IEEE Access , volume=

A multimodal deep fusion method for stock movement prediction using heterogeneous data source , author=. IEEE Access , volume=
[27]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics , volume=

FLAG-TRADER: Fusion LLM-agent with gradient-based reinforcement learning for financial trading , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics , volume=
[28]

The Journal of Finance , volume=

Portfolio selection , author=. The Journal of Finance , volume=
[29]

The Review of Financial Studies , volume=

Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? , author=. The Review of Financial Studies , volume=
[30]

Journal of Economic Theory , volume=

Portfolio selection with transactions costs , author=. Journal of Economic Theory , volume=
[31]

Portfolio Selection

Markowitz's “Portfolio Selection”: A fifty-year retrospective , author=. The Journal of Finance , volume=
[32]

Proceedings of the National Academy of Sciences , year =

Richard Bellman and Robert Kalaba , title =. Proceedings of the National Academy of Sciences , year =
[33]

1998 , publisher =

Reinforcement Learning: An Introduction , author =. 1998 , publisher =

1998
[34]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016
[35]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Deeptrader: A deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[36]

1948 , publisher=

Handbook of mathematical functions with formulas, graphs, and mathematical tables , author=. 1948 , publisher=

1948