RankGLU: Residual Gated Score Formation for Cross-Sectional Stock Prediction
Pith reviewed 2026-06-27 15:08 UTC · model grok-4.3
The pith
RankGLU raises mean information coefficient on CSI300 by preserving a linear scoring path while adding a bounded multiplicative branch for controlled nonlinear interactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RankGLU is a residual bottleneck gated linear unit that keeps an un-gated linear scoring path and adds a bounded multiplicative branch; under a unified cross-sectional normalization protocol and IC-augmented objective, this architecture raises mean IC on CSI300 from 0.0697 to 0.0727 while remaining stable across seeds, and ablation shows the clearest degradation when the GLU head is removed.
What carries the argument
RankGLU, a residual bottleneck gated linear unit that preserves a direct linear scoring path and adds a bounded multiplicative branch to enable controlled nonlinear feature interactions without destabilizing the ordering.
If this is right
- On CSI300 the mean IC rises from 0.0654 for the original backbone and 0.0697 for the ranking-aware backbone to 0.0727 for RankGLU, with the gain holding across all five seeds.
- The best single seed of RankGLU also exceeds the corresponding best seeds of the baselines.
- Removing the GLU prediction head produces the largest performance drop among the tested component ablations.
- Relation-path calibrations can reach high single-seed peaks but show less stable multi-seed behavior than the residual gated head.
- The same RankGLU head is reported to improve results on the larger CSI800 universe under the identical protocol.
Where Pith is reading between the lines
- The emphasis on keeping an explicit linear route suggests that any ranking model whose loss directly penalizes ordering errors may benefit from an analogous residual linear bypass in its final layer.
- Because the improvement is measured after cross-sectional normalization, the result implies that future work could test whether the same head still helps when the normalization step is removed or replaced by a different post-processing rule.
- The bounded multiplicative branch could be viewed as a lightweight way to inject limited feature interaction without the full parameter cost of deeper nonlinear layers, which might be tested on other cross-sectional ranking problems outside finance.
Load-bearing premise
The unified protocol with cross-sectional score normalization and IC-augmented objective is assumed to isolate the contribution of the prediction head in a fair and stable way.
What would settle it
A replication on CSI300 using the same backbone, seeds, and protocol but with a different prediction head that achieves a higher mean IC across all five seeds would falsify the claim that RankGLU is the most reliable improvement.
read the original abstract
Cross-sectional stock prediction is closer to a ranking problem than to ordinary return-magnitude regression, since portfolio decisions depend on the relative ordering of assets within each trading date. Existing temporal, graph-based, and market-conditioned attention models have improved stock representation learning, yet the final prediction head is often treated as a minor implementation detail. This paper argues that, under information-coefficient-oriented evaluation, score formation is a critical bottleneck: an over-flexible head can fit unstable return magnitude, whereas an overly linear head may underuse cross-feature interactions. We therefore develop RankGLU, a residual bottleneck gated linear unit for cross-sectional stock ranking. RankGLU keeps a direct linear scoring path and adds a bounded multiplicative branch, thereby preserving a stable ordering route while allowing controlled nonlinear interactions. The method is evaluated on CSI300 and CSI800 under a unified protocol with cross-sectional score normalization and an IC-augmented objective. Multi-seed experiments show that, on CSI300, RankGLU achieves the strongest mean IC among the internally controlled variants, improving from 0.0654+/-0.0052 for the original backbone and 0.0697+/-0.0030 for the ranking-aware backbone to 0.0727+/-0.0037, a gain that is consistent across all five seeds. Its best-seed result also exceeds the corresponding baselines. Ablation results further indicate that removing the GLU prediction head causes the clearest degradation among the tested component changes. Additional relation-path calibrations can produce high single-seed peaks, but their multi-seed behavior is less stable. The evidence suggests that ranking-aware stock models benefit most reliably from bounded residual score formation rather than from indiscriminate architectural expansion.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RankGLU, a residual bottleneck gated linear unit as the prediction head for cross-sectional stock ranking models. It claims this design improves information coefficient (IC) by preserving a stable linear scoring path while adding bounded multiplicative interactions. On CSI300, it reports mean IC of 0.0727±0.0037 versus 0.0654±0.0052 (original backbone) and 0.0697±0.0030 (ranking-aware backbone), with the gain consistent across all five seeds; similar patterns hold on CSI800. Ablations attribute the improvement primarily to the GLU head, and the evaluation uses a unified protocol with cross-sectional normalization and an IC-augmented objective.
Significance. If the empirical gains hold under full verification, the work usefully isolates the contribution of the final score-formation module in ranking-oriented stock models and shows that modest, bounded nonlinearity in the head can outperform both purely linear and more flexible alternatives. The multi-seed reporting and component ablations are strengths that support the central claim of reliable improvement without backbone expansion.
major comments (2)
- [Methods] Methods section: the manuscript provides no detailed description of the backbone architectures, exact feature sets, time periods, or train/validation/test splits for CSI300 and CSI800, nor the precise formulation of the IC-augmented objective and cross-sectional normalization steps. This absence prevents verification that the reported IC gains (0.0727 vs. 0.0697/0.0654) are attributable to RankGLU rather than protocol details.
- [Results] Results and ablation sections: while mean IC and standard deviations across five seeds are stated, the paper does not report per-seed values, statistical significance tests, or the full ablation table that would confirm the claim that 'removing the GLU prediction head causes the clearest degradation' and that relation-path calibrations are less stable.
minor comments (2)
- [Abstract] The abstract and text use 'IC-augmented objective' without an equation or pseudocode; adding this would clarify how the ranking loss interacts with the head.
- Notation for the gated linear unit (e.g., the exact form of the bounded multiplicative branch) should be defined explicitly with an equation to allow direct comparison with standard GLU variants.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript accordingly to improve reproducibility and evidentiary strength.
read point-by-point responses
-
Referee: [Methods] Methods section: the manuscript provides no detailed description of the backbone architectures, exact feature sets, time periods, or train/validation/test splits for CSI300 and CSI800, nor the precise formulation of the IC-augmented objective and cross-sectional normalization steps. This absence prevents verification that the reported IC gains (0.0727 vs. 0.0697/0.0654) are attributable to RankGLU rather than protocol details.
Authors: We agree that the current Methods section lacks the level of detail required for full reproducibility and independent verification of the source of the reported IC improvements. In the revised manuscript we will add a dedicated experimental protocol subsection that specifies the backbone architectures, exact feature sets, time periods, train/validation/test splits for CSI300 and CSI800, the precise mathematical form of the IC-augmented objective, and the cross-sectional normalization procedure. These additions will allow readers to confirm that the observed gains are attributable to the RankGLU head. revision: yes
-
Referee: [Results] Results and ablation sections: while mean IC and standard deviations across five seeds are stated, the paper does not report per-seed values, statistical significance tests, or the full ablation table that would confirm the claim that 'removing the GLU prediction head causes the clearest degradation' and that relation-path calibrations are less stable.
Authors: We concur that per-seed values, formal statistical tests, and the complete ablation table would provide stronger support for the claims. We will revise the Results and ablation sections to include a table of per-seed IC values for all variants, report the results of appropriate statistical significance tests (e.g., paired t-tests), and present the full ablation table with all component variations to substantiate the statements on degradation from GLU removal and the relative stability of relation-path calibrations. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an empirical architecture (RankGLU) and reports mean IC improvements on held-out CSI300/CSI800 data under a fixed evaluation protocol. The central claim is a performance delta (0.0727 vs. 0.0654/0.0697) measured across five seeds; this is a direct experimental outcome rather than an algebraic identity or a fitted parameter renamed as a prediction. No equations are shown that define the target metric in terms of the model parameters being optimized, no self-citation chain is invoked to justify uniqueness, and the IC-augmented objective is an explicit training choice whose effect is measured externally. The derivation chain is therefore self-contained experimental comparison.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Jour- nal of Finance47(2), 427–465 (1992) https://doi.org/10.1111/j.1540-6261.1992
Fama, E.F., French, K.R.: The cross-section of expected stock returns. The Jour- nal of Finance47(2), 427–465 (1992) https://doi.org/10.1111/j.1540-6261.1992. tb04398.x
-
[2]
Journal of Financial Economics33(1), 3–56 (1993) https://doi.org/10
Fama, E.F., French, K.R.: Common risk factors in the returns on stocks and bonds. Journal of Financial Economics33(1), 3–56 (1993) https://doi.org/10. 1016/0304-405X(93)90023-5
1993
-
[3]
The Journal of Finance48(1), 65–91 (1993) https://doi.org/10.1111/j.1540-6261.1993.tb04702.x
Jegadeesh, N., Titman, S.: Returns to buying winners and selling losers: Impli- cations for stock market efficiency. The Journal of Finance48(1), 65–91 (1993) https://doi.org/10.1111/j.1540-6261.1993.tb04702.x
-
[4]
The Journal of Finance52(1), 57–82 (1997) https://doi.org/10.1111/j.1540-6261.1997.tb03808.x
Carhart, M.M.: On persistence in mutual fund performance. The Journal of Finance52(1), 57–82 (1997) https://doi.org/10.1111/j.1540-6261.1997.tb03808.x
-
[5]
McLean, R.D., Pontiff, J.: Does academic research destroy stock return pre- dictability? The Journal of Finance71(1), 5–32 (2016) https://doi.org/10.1111/ jofi.12365
2016
-
[6]
The Review of Financial Studies29(1), 5–68 (2016) https://doi.org/10.1093/rfs/ hhv059
Harvey, C.R., Liu, Y., Zhu, H.: ...and the cross-section of expected returns. The Review of Financial Studies29(1), 5–68 (2016) https://doi.org/10.1093/rfs/ hhv059
-
[7]
The Review of Financial Studies33(5), 2223–2273 (2020) https://doi.org/10.1093/ rfs/hhaa009
Gu, S., Kelly, B., Xiu, D.: Empirical asset pricing via machine learning. The Review of Financial Studies33(5), 2223–2273 (2020) https://doi.org/10.1093/ rfs/hhaa009
2020
-
[8]
Finance Research Letters91, 109462 (2026) https://doi.org/10.1016/j.frl.2025.109462
Chen, B.: Can machine learning uncover ESG alpha in the chinese A-share mar- ket? an alpha illusion case study. Finance Research Letters91, 109462 (2026) https://doi.org/10.1016/j.frl.2025.109462
-
[9]
Omega29(4), 309–317 (2001) https://doi.org/10.1016/ S0305-0483(01)00026-3
Tay, F.E.H., Cao, L.: Application of support vector machines in financial time series forecasting. Omega29(4), 309–317 (2001) https://doi.org/10.1016/ S0305-0483(01)00026-3
2001
-
[10]
Neu- rocomputing55(1–2), 307–319 (2003) https://doi.org/10.1016/S0925-2312(03) 00372-2
Kim, K.-j.: Financial time series forecasting using support vector machines. Neu- rocomputing55(1–2), 307–319 (2003) https://doi.org/10.1016/S0925-2312(03) 00372-2
-
[11]
Expert Systems with Applications 38(5), 5311–5319 (2011) https://doi.org/10.1016/j.eswa.2010.10.027
Kara, Y., Boyacioglu, M.A., Baykan, ¨O.K.: Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the istanbul stock exchange. Expert Systems with Applications 38(5), 5311–5319 (2011) https://doi.org/10.1016/j.eswa.2010.10.027
-
[12]
Expert Systems with Applications42(1), 259–268 (2015) https://doi
Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement using trend deterministic data preparation and machine learning 23 techniques. Expert Systems with Applications42(1), 259–268 (2015) https://doi. org/10.1016/j.eswa.2014.07.040
-
[13]
Expert Sys- tems with Applications83, 187–205 (2017) https://doi.org/10.1016/j.eswa.2017
Chong, E., Han, C., Park, F.C.: Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Sys- tems with Applications83, 187–205 (2017) https://doi.org/10.1016/j.eswa.2017. 04.030
-
[14]
Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. European Journal of Oper- ational Research259(2), 689–702 (2017) https://doi.org/10.1016/j.ejor.2016.10. 031
-
[15]
Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research270(2), 654–669 (2018) https://doi.org/10.1016/j.ejor.2017.11.054
-
[16]
Zhang, Z., Zohren, S., Roberts, S.: DeepLOB: Deep convolutional neural networks for limit order books. IEEE Transactions on Signal Processing67(11), 3001–3012 (2019) https://doi.org/10.1109/TSP.2019.2907260
-
[17]
Applied Soft Computing90, 106181 (2020) https://doi.org/10.1016/j.asoc.2020.106181
Sezer, O.B., Gudelek, M.U., Ozbayoglu, A.M.: Financial time series forecast- ing with deep learning: A systematic literature review: 2005–2019. Applied Soft Computing90, 106181 (2020) https://doi.org/10.1016/j.asoc.2020.106181
-
[18]
Long short-term memory.Neural Comput., 9(8): 1735–1780, November 1997
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735
-
[19]
Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation
Cho, K., Merri¨ enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014). https: //doi.org/10.3115/v1/D14-1179
-
[20]
arXiv preprint arXiv:1803.01271 (2018)
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Pith/arXiv arXiv 2018
-
[21]
In: Advances in Neural Information Processing Systems, vol
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
2017
-
[22]
Journal of Intelligent Information Systems64, 735–771 (2026) https://doi.org/10.1007/ s10844-025-01020-9 24
Huang, Y., Ma, T., Yang, K.,et al.: FinSent-DistillQ: A distilled large language model with chain-of-thought fine-tuning for financial sentiment analysis. Journal of Intelligent Information Systems64, 735–771 (2026) https://doi.org/10.1007/ s10844-025-01020-9 24
2026
-
[23]
Journal of Intelligent Information Systems64, 597–620 (2026) https://doi.org/10.1007/ s10844-025-01015-6
Xun, H., Zhou, W., Tao, L.,et al.: TS2Lang: A co-occurrence pattern-driven translation mechanism for zero-shot time series forecasting with LLMs. Journal of Intelligent Information Systems64, 597–620 (2026) https://doi.org/10.1007/ s10844-025-01015-6
2026
-
[24]
In: International Conference on Learning Representations (2018)
Veliˇ ckovi´ c, P., Cucurull, G., Casanova, A., Romero, A., Li` o, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018)
2018
-
[25]
ACM Transactions on Information Systems37(2), 1–30 (2019) https://doi.org/10.1145/3309547
Feng, F., He, X., Wang, X., Luo, C., Liu, Y., Chua, T.-S.: Temporal relational ranking for stock prediction. ACM Transactions on Information Systems37(2), 1–30 (2019) https://doi.org/10.1145/3309547
-
[26]
Multivariate Time- Series Anomaly Detection via Graph Attention Net- work
Sawhney, R., Agarwal, S., Wadhwa, A., Shah, R.R.: Spatio-temporal hypergraph convolution network for stock movement forecasting. In: Proceedings of the IEEE International Conference on Data Mining, pp. 482–491 (2020). https://doi.org/ 10.1109/ICDM50108.2020.00056
-
[27]
Knowledge-Based Systems, 114766 (2025)
Mao, M., Han, Y., Wang, B.: Hybrid-relation dynamic hypergraph attention network for traffic flow prediction. Knowledge-Based Systems, 114766 (2025)
2025
-
[28]
In: Proceedings of the AAAI Conference on Artificial Intelligence, vol
Li, T., Liu, Z., Shen, Y., Wang, X., Chen, H., Huang, S.: MASTER: Market- guided stock transformer for stock price forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 162–170 (2024). https://doi.org/ 10.1609/aaai.v38i1.27767
-
[29]
Journal of Financial Economics134(3), 501–524 (2019) https: //doi.org/10.1016/j.jfineco.2019.05.001
Kelly, B.T., Pruitt, S., Su, Y.: Characteristics are covariances: A unified model of risk and return. Journal of Financial Economics134(3), 501–524 (2019) https: //doi.org/10.1016/j.jfineco.2019.05.001
-
[30]
Shrinking the cross-section.Journal of Financial Economics, 135(2):271–292, 2020
Kozak, S., Nagel, S., Santosh, S.: Shrinking the cross section. Journal of Financial Economics135(2), 271–292 (2020) https://doi.org/10.1016/j.jfineco.2019.06.008
-
[31]
Gu, S., Kelly, B., Xiu, D.: Autoencoder asset pricing models. Journal of Econo- metrics222(1), 429–450 (2021) https://doi.org/10.1016/j.jeconom.2020.07.009
-
[32]
In: Proceedings of the 34th International Conference on Machine Learning, pp
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 933–941 (2017)
2017
-
[33]
Preprint at https://arxiv.org/ abs/2002.05202 (2020) 25
Shazeer, N.: GLU Variants Improve Transformer. Preprint at https://arxiv.org/ abs/2002.05202 (2020) 25
Pith/arXiv arXiv 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.