pith. machine review for the scientific record. sign in

arxiv: 2605.02300 · v1 · submitted 2026-05-04 · 💻 cs.LG

Recognition: 3 theorem links

· Lean Theorem

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:37 UTC · model grok-4.3

classification 💻 cs.LG
keywords meta reinforcement learninggoals-based wealth managementportfolio optimizationdynamic programmingzero-shot meta-learningfinancial goalsinvestment strategy
0
0 comments X

The pith

Pre-trained meta reinforcement learning produces near-optimal goals-based wealth management strategies in milliseconds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a meta reinforcement learning model pre-trained on thousands of goals-based wealth management problems. This allows the model to quickly generate dynamic investment and goal fulfillment strategies for entirely new problems without additional training. It achieves expected utilities that are on average 97.8 percent of those from full dynamic programming optimization. The approach remains effective even when market conditions differ from those used in training and can address problems too large for traditional dynamic programming methods.

Core claim

The MetaRL approach, by pre-training on thousands of GBWM problems, enables inference-mode solutions for new GBWM problems in a few hundredths of a second that deliver expected utilities averaging 97.8% of the optimal expected utilities from Dynamic Programming, and these results hold robustly across capital market regime changes.

What carries the argument

Meta reinforcement learning model pre-trained for zero-shot application to goals-based wealth management problems involving annual portfolio choices and goal fulfillments to maximize expected utility.

If this is right

  • New investor problems can be solved without separate training or optimization steps.
  • Problems with state spaces larger than what dynamic programming can handle become solvable.
  • The model works even if training used only one market regime but testing uses different regimes.
  • Expected utilities close to optimal are obtained in real time for dynamic portfolio and goal decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other sequential decision problems in finance where pre-training on variants allows fast adaptation to new constraints.
  • Integration with real-time market data feeds might allow continuous updating of strategies without full re-optimization.
  • Similar meta-learning could reduce computation in other goal-oriented optimization domains like retirement planning with multiple objectives.

Load-bearing premise

That pre-training on thousands of GBWM problems produces a model that generalizes to new investor problems with different parameters, goals, and market conditions without significant performance loss.

What would settle it

Running the MetaRL model on a held-out set of GBWM problems with parameters outside the training distribution and finding that average utility falls substantially below 97.8% of the dynamic programming benchmark would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2605.02300 by Daniel Ostrov, Deep Srivastav, Harshad Khadilkar, Hungjen Wang, Sanjiv R. Das, Sukrit Mittal.

Figure 1
Figure 1. Figure 1: Logical flowchart of the MetaRL approach. view at source ↗
Figure 2
Figure 2. Figure 2: Left panel: Plot of each of the 66 test suite GBWM problems comparing their runtimes (in seconds, with the c7i.24xlarge machine) using DP’s backwards pass versus using RL inference (and 10,000 Monte Carlo simulations). Right panel: Comparison of average runtimes (in seconds) over the 66 test suite GBWM scenarios for DP and RL inference on various hardware. Compute-optimized machine instances on AWS were us… view at source ↗
Figure 3
Figure 3. Figure 3: Heatmaps comparing the investment portfolio decisions and goal-taking decisions made by DP versus RL inference for test suite case 20. In the investment portfolio plots, the darker the color, the more aggressive the investment portfolio. In the goal-taking plots, the dark colored bars are regions in which the available goal is taken. The initial wealth, 100 (thousand) dollars, is denoted by the orange bar … view at source ↗
Figure 4
Figure 4. Figure 4: The baseline efficient frontier used for training throughout this paper and, other than in Subsection 4.5, all of the examples in this paper. In Subsection 4.5, the other five efficient frontiers, labeled by the time ranges of the returns used to generate them, are used for out-of-distribution testing examples. For this robustness test, we considered five different efficient frontiers labeled in view at source ↗
Figure 5
Figure 5. Figure 5: Heatmaps for investment portfolio decisions and goal-taking decisions with case CP4, which has both concurrent goals (cars and trips) and partial goals (for both cars and trips). The specifics of case CP4 are in the text. The colors in the second row (Goal-taking Decisions) indicate the ratio of the expected attained utility from that year’s goals over the utility attained if all of that year’s full goals … view at source ↗
Figure 6
Figure 6. Figure 6: was compiled to test whether 1000 epochs is sufficient. It shows that it is more than enough, with a large fraction of the algorithm’s learning being completed within the first 100 epochs. The figure uses the “RL-Efficiency” determined from all five models. The RL-Efficiency is defined in Subsection 4.4, except that the denominator here is the DP value function (also discussed in that subsection). The bold… view at source ↗
Figure 7
Figure 7. Figure 7: The same information shown in view at source ↗
read the original abstract

Applying concepts related to zero-shot meta-learning and pre-training of foundation models, we develop a meta reinforcement learning approach (denoted MetaRL) that is pre-trained on thousands of goals-based wealth management (GBWM) problems. Each GBWM problem involves a multiple year scenario over which the investor looks to optimally choose an investment portfolio each year and choose to fulfill all, some, or none of the different financial goals that arise each year. These choices seek to maximize the expected total investor utility obtained from the fulfilled financial goals. By eliminating separate training and optimization for each new investor problem, the MetaRL model in inference mode produces near-optimal dynamic investment portfolio and goal-fulfilling strategies for a new GBWM problem within a few hundredths of a second. This delivers expected utilities that are, on average, 97.8% of the optimal expected utilities (determined via Dynamic Programming). These results are remarkably robust to capital market regime changes, even when training uses only one capital market regime. Further, the MetaRL approach can enable solving problems with larger state spaces where Dynamic Programming becomes computationally infeasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a meta-reinforcement learning (MetaRL) framework pre-trained on thousands of goals-based wealth management (GBWM) problems. In inference, the model generates dynamic investment portfolio and goal-fulfillment strategies for unseen GBWM instances in milliseconds, achieving average expected utilities of 97.8% relative to dynamic programming (DP) optima. It claims robustness to capital market regime changes and applicability to large state spaces where DP is intractable.

Significance. If the empirical results hold under rigorous validation, this work could enable practical, real-time optimization for complex personalized financial planning problems, extending meta-RL techniques to a high-stakes sequential decision domain. The reported inference speed and cross-regime robustness would represent a meaningful advance over per-instance DP or standard RL training, with potential to inspire similar meta-learning applications in other uncertain planning settings.

major comments (2)
  1. [Abstract] Abstract: The central performance claim that MetaRL delivers expected utilities averaging 97.8% of DP optima is presented without any description of the experimental protocol, including the number and parameterization of test GBWM problems, the state-space dimensions used for the DP comparisons, the number of evaluation runs, or measures of variability. This detail is load-bearing because the near-optimality assertion rests entirely on this figure.
  2. [Abstract] Abstract: The claim that MetaRL solves larger state-space GBWM problems (where DP is computationally infeasible) with near-optimal strategies lacks any supporting optimality anchor, upper bound, or proxy metric for those instances. The 97.8% figure applies only to small-state cases amenable to DP; the extension to the regime where the method is positioned as most useful therefore relies on untested extrapolation.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'zero-shot meta-learning' is invoked but not operationally distinguished from standard meta-RL pre-training and fine-tuning in the GBWM setting; a brief clarification would improve precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thorough review and insightful comments on our manuscript. We address each major comment below and indicate where revisions will be made to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claim that MetaRL delivers expected utilities averaging 97.8% of DP optima is presented without any description of the experimental protocol, including the number and parameterization of test GBWM problems, the state-space dimensions used for the DP comparisons, the number of evaluation runs, or measures of variability. This detail is load-bearing because the near-optimality assertion rests entirely on this figure.

    Authors: We agree that the abstract would benefit from a concise description of the experimental protocol to support the 97.8% claim. While the full protocol (including test problem parameterization, state-space dimensions for DP comparisons, number of runs, and variability) is detailed in the Experiments section, we will revise the abstract to briefly summarize these elements so the performance figure is self-contained. revision: yes

  2. Referee: [Abstract] Abstract: The claim that MetaRL solves larger state-space GBWM problems (where DP is computationally infeasible) with near-optimal strategies lacks any supporting optimality anchor, upper bound, or proxy metric for those instances. The 97.8% figure applies only to small-state cases amenable to DP; the extension to the regime where the method is positioned as most useful therefore relies on untested extrapolation.

    Authors: We thank the referee for this observation. The abstract does not claim near-optimality for larger state-space problems; it states only that the approach 'can enable solving problems with larger state spaces where Dynamic Programming becomes computationally infeasible.' The 97.8% figure is tied exclusively to DP-comparable instances. We will revise the abstract to explicitly distinguish these regimes and clarify that no direct optimality benchmark is provided for large instances. We will also add discussion of feasibility demonstrations and proxy checks in the main text to avoid any implication of untested extrapolation. revision: partial

Circularity Check

0 steps flagged

No circularity; performance measured against independent DP benchmark on tractable instances

full rationale

The paper's central result is an empirical claim: a meta-RL model pre-trained on thousands of GBWM instances produces policies whose expected utility reaches 97.8% of the value obtained by exact Dynamic Programming on new test problems. DP is an external, non-learned algorithm whose optimality is defined by the Bellman equation and does not depend on the MetaRL parameters or outputs. No equation in the abstract or described method reduces the reported utility ratio to a fitted quantity, a self-citation, or a redefinition of optimality. The extension to larger state spaces is presented only as computational feasibility, not as a measured optimality percentage. This is a standard train-then-evaluate protocol with no load-bearing self-referential step.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities; the approach implicitly relies on standard RL assumptions such as Markov decision process formulation and utility maximization.

pith-pipeline@v0.9.0 · 5508 in / 1091 out tokens · 75174 ms · 2026-05-08T18:37:42.306719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

300 extracted references · 189 canonical work pages · 18 internal anchors

  1. [1]

    The Review of Financial Studies , author =

    Machine. The Review of Financial Studies , author =. 2024 , pages =. doi:10.1093/rfs/hhae043 , abstract =

  2. [2]

    INFORMS Journal on Data Science , author =

    Credit. INFORMS Journal on Data Science , author =. 2023 , note =. doi:10.1287/ijds.2022.00018 , abstract =

  3. [3]

    Annals of Operations Research , author =

    Prediction of bank credit worthiness through credit risk analysis: an explainable machine learning study , issn =. Annals of Operations Research , author =. 2024 , keywords =. doi:10.1007/s10479-024-06134-x , abstract =

  4. [4]

    Annals of Operations Research , author =

    Extending application of explainable artificial intelligence for managers in financial organizations , issn =. Annals of Operations Research , author =. 2024 , keywords =. doi:10.1007/s10479-024-05825-9 , abstract =

  5. [5]

    2023 , keywords =

    Annals of Operations Research , author =. 2023 , keywords =. doi:10.1007/s10479-023-05631-9 , abstract =

  6. [6]

    Journal of Banking & Finance , author =

    Dynamic optimization for multi-goals wealth management , volume =. Journal of Banking & Finance , author =. 2022 , keywords =. doi:10.1016/j.jbankfin.2021.106192 , abstract =

  7. [7]

    Journal of Wealth Management , author =

    Optimal. Journal of Wealth Management , author =

  8. [8]

    Annals of Operations Research , author =

    Goal-based investing with goal postponement: multistage stochastic mixed-integer programming approach , issn =. Annals of Operations Research , author =. 2024 , keywords =. doi:10.1007/s10479-024-06146-7 , abstract =

  9. [9]

    The Review of Financial Studies , author =

    A. The Review of Financial Studies , author =. 2005 , pages =. doi:10.1093/rfs/hhi019 , abstract =

  10. [10]

    The Journal of Finance , author =

    Dynamic. The Journal of Finance , author =. 2006 , note =. doi:10.1111/j.1540-6261.2006.01055.x , abstract =

  11. [11]

    Financial Analysts Journal , author =

    Asset. Financial Analysts Journal , author =. 1991 , note =

  12. [12]

    Journal of Political Economy , author =

    The. Journal of Political Economy , author =. 1973 , note =

  13. [13]

    The Journal of Portfolio Management , author =

    Liabilities—. The Journal of Portfolio Management , author =. 1990 , note =. doi:10.3905/jpm.1990.409248 , language =

  14. [14]

    Upbin, Brian and Stefanelli, Nikki and Gendron, Nicholas , month = may, year =

  15. [15]

    Liability-

    Stockton, Kimberly and Donaldson, Scott and Shtekhman, Anatoly , year =. Liability-

  16. [16]

    The Journal of Portfolio Management , author =

    Liability-. The Journal of Portfolio Management , author =. 2013 , note =. doi:10.3905/jpm.2013.40.1.071 , abstract =

  17. [17]

    Computers & Operations Research , author =

    Funding and investment decisions in a stochastic defined benefit pension plan with several levels of labor-income earnings , volume =. Computers & Operations Research , author =. 2008 , keywords =. doi:10.1016/j.cor.2006.02.021 , abstract =

  18. [18]

    International Review of Financial Analysis , author =

    Pension de-risking choice and firm risk:. International Review of Financial Analysis , author =. 2022 , keywords =. doi:10.1016/j.irfa.2022.102064 , abstract =

  19. [19]

    Journal of Insurance Issues , author =

    Optimal. Journal of Insurance Issues , author =. 2021 , note =

  20. [20]

    Insurance: Mathematics and Economics , author =

    Mean-variance optimization problems for an accumulation phase in a defined benefit plan , volume =. Insurance: Mathematics and Economics , author =. 2008 , note =

  21. [21]

    Insurance: Mathematics and Economics , author =

    Optimal pension management in a stochastic framework , volume =. Insurance: Mathematics and Economics , author =. 2004 , keywords =. doi:10.1016/j.insmatheco.2003.11.001 , abstract =

  22. [22]

    optimal pension management in a stochastic framework

    On “optimal pension management in a stochastic framework” with exponential utility , volume =. Insurance: Mathematics and Economics , author =. 2011 , keywords =. doi:10.1016/j.insmatheco.2011.02.003 , abstract =

  23. [23]

    and Parker, Jonathan A

    Duarte, Victor and Fonseca, Julia and Goodman, Aaron S. and Parker, Jonathan A. , month = dec, year =. Simple. doi:10.3386/w29559 , note =

  24. [24]

    Insurance: Mathematics and Economics , author =

    Stochastic optimal control of annuity contracts , volume =. Insurance: Mathematics and Economics , author =. 2003 , note =

  25. [25]

    Insurance: Mathematics and Economics , author =

    Portfolio optimization in a defined benefit pension plan where the risky assets are processes with constant elasticity of variance , volume =. Insurance: Mathematics and Economics , author =. 2018 , note =

  26. [26]

    Journal of Risk and Financial Management , author =

    Combining. Journal of Risk and Financial Management , author =. 2021 , note =. doi:10.3390/jrfm14070285 , abstract =

  27. [27]

    The Journal of Wealth Management , author =

    Portfolios for. The Journal of Wealth Management , author =. 2011 , pages =. doi:10.3905/jwm.2011.14.2.025 , language =

  28. [28]

    Journal of Financial and Quantitative Analysis , author =

    Portfolio. Journal of Financial and Quantitative Analysis , author =. 2010 , pages =

  29. [29]

    Journal of Economic Dynamics and Control , author =

    Options and structured products in behavioral portfolios , volume =. Journal of Economic Dynamics and Control , author =. 2013 , keywords =

  30. [30]

    Proceedings of the National Academy of Sciences , author =

    On the. Proceedings of the National Academy of Sciences , author =. 1952 , pmid =. doi:10.1073/pnas.38.8.716 , language =

  31. [31]

    Das, Sanjiv Ranjan and Ross, Greg , month = sep, year =. The. doi:10.2139/ssrn.3691699 , keywords =

  32. [32]

    Journal of Investment Management , author =

    Multi-. Journal of Investment Management , author =. 2020 , pages =

  33. [33]

    Applied Economics , author =

    What determines the asset allocation of defined benefit pension funds? , volume =. Applied Economics , author =. 2021 , note =. doi:10.1080/00036846.2021.1897512 , abstract =

  34. [34]

    The Journal of Finance , author =

    The. The Journal of Finance , author =. 1978 , note =. doi:10.1111/j.1540-6261.1978.tb03397.x , language =

  35. [35]

    Distributional

    Halperin, Igor , month = apr, year =. Distributional. doi:10.48550/arXiv.2104.01040 , abstract =

  36. [36]

    arXiv preprint arXiv:2302.05206 , year =

    Zhang, Tianjun and Liu, Fangchen and Wong, Justin and Abbeel, Pieter and Gonzalez, Joseph E. , month = feb, year =. The. doi:10.48550/arXiv.2302.05206 , abstract =

  37. [37]

    Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

  38. [38]

    Ziegler, Ryan Lowe, Chelsea V oss, Alec Radford, Dario Amodei, and Paul Christiano

    Stiennon, Nisan and Ouyang, Long and Wu, Jeff and Ziegler, Daniel M. and Lowe, Ryan and Voss, Chelsea and Radford, Alec and Amodei, Dario and Christiano, Paul , month = feb, year =. Learning to summarize from human feedback , url =. doi:10.48550/arXiv.2009.01325 , abstract =

  39. [39]

    Fine-Tuning Language Models from Human Preferences

    Ziegler, Daniel M. and Stiennon, Nisan and Wu, Jeffrey and Brown, Tom B. and Radford, Alec and Amodei, Dario and Christiano, Paul and Irving, Geoffrey , month = jan, year =. Fine-. doi:10.48550/arXiv.1909.08593 , abstract =

  40. [40]

    Asynchronous methods for deep reinforcement learning.arXiv preprint arXiv:1602.01783,

    Mnih, Volodymyr and Badia, Adrià Puigdomènech and Mirza, Mehdi and Graves, Alex and Lillicrap, Timothy P. and Harley, Tim and Silver, David and Kavukcuoglu, Koray , month = jun, year =. Asynchronous. doi:10.48550/arXiv.1602.01783 , abstract =

  41. [41]

    Quanta Magazine , author =

    Machines. Quanta Magazine , author =. 2023 , file =

  42. [42]

    NAIC , file =. Pension

  43. [43]

    PonderNet: Learning to ponder.arXiv preprint arXiv:2106.01345,

    Decision. arXiv:2106.01345 [cs] , author =. 2021 , note =

  44. [44]

    and Xu, Renyuan and Yang, Huining , month = nov, year =

    Hambly, Ben M. and Xu, Renyuan and Yang, Huining , month = nov, year =. Recent. doi:10.2139/ssrn.3971071 , keywords =

  45. [45]

    Journal of Applied Mathematics and Decision Sciences , author =

    Three. Journal of Applied Mathematics and Decision Sciences , author =. 2004 , note =. doi:10.1207/s15327612jamd0801_1 , abstract =

  46. [46]

    Mathematical Finance , author =

    Recent advances in reinforcement learning in finance , volume =. Mathematical Finance , author =. doi:10.1111/mafi.12382 , abstract =

  47. [47]

    Understanding

    Raschka, Sebastian , month = apr, year =. Understanding

  48. [48]

    Kodres, Laura , month = may, year =. Too. doi:10.2139/ssrn.4445632 , abstract =

  49. [49]

    Journal of Financial Economics , author =

    A note on the geometry of. Journal of Financial Economics , author =. 1985 , pages =. doi:10.1016/0304-405X(85)90003-0 , abstract =

  50. [50]

    Journal of Financial Economics , author =

    Potential performance and tests of portfolio efficiency , volume =. Journal of Financial Economics , author =. 1982 , note =

  51. [51]
  52. [52]

    Financial Analysts Journal , author =

    The. Financial Analysts Journal , author =. 1989 , note =

  53. [53]

    Journal of the American Statistical Association , author =

    Estimation for. Journal of the American Statistical Association , author =. 1980 , note =. doi:10.2307/2287643 , abstract =

  54. [54]

    Physical Review , author =

    On the. Physical Review , author =. 1930 , note =. doi:10.1103/PhysRev.36.823 , abstract =

  55. [55]

    1977 , issn =

    An equilibrium characterization of the term structure , volume =. Journal of Financial Economics , author =. 1977 , pages =. doi:10.1016/0304-405X(77)90016-2 , abstract =

  56. [56]

    Sgouros, Tom , year =. Funding

  57. [57]

    Sgouros, Tom , year =. The

  58. [58]

    doi:10.2139/ssrn.4566372 , abstract =

    Halperin, Igor , month = sep, year =. doi:10.2139/ssrn.4566372 , abstract =

  59. [59]

    Journal of Risk and Insurance , author =

    Managing. Journal of Risk and Insurance , author =. 2013 , note =. doi:10.1111/j.1539-6975.2012.01508.x , abstract =

  60. [60]

    Journal of Risk and Insurance , author =

    The. Journal of Risk and Insurance , author =. 2013 , note =. doi:10.1111/j.1539-6975.2011.01456.x , abstract =

  61. [61]

    Journal of Risk and Insurance , author =

    A. Journal of Risk and Insurance , author =. 2018 , note =. doi:10.1111/jori.12150 , abstract =

  62. [62]

    Journal of Risk and Insurance , author =

    Pension. Journal of Risk and Insurance , author =. 2013 , note =. doi:10.1111/j.1539-6975.2012.01465.x , abstract =

  63. [63]

    Journal of Risk and Insurance , author =

    Pension. Journal of Risk and Insurance , author =. 2002 , note =. doi:10.1111/1539-6975.00013 , abstract =

  64. [64]

    Journal of Risk and Insurance , author =

    The. Journal of Risk and Insurance , author =. 2002 , note =. doi:10.1111/1539-6975.00012 , abstract =

  65. [65]

    Journal of Risk and Insurance , author =

    Informed. Journal of Risk and Insurance , author =. 2013 , note =. doi:10.1111/j.1539-6975.2013.01524.x , abstract =

  66. [66]

    Journal of Risk and Insurance , author =

    Optimal. Journal of Risk and Insurance , author =. 2010 , note =. doi:10.1111/j.1539-6975.2009.01350.x , abstract =

  67. [67]

    In deep reinforcement learning, a pruned network is a good network , url =

    Obando-Ceron, Johan and Courville, Aaron and Castro, Pablo Samuel , month = feb, year =. In deep reinforcement learning, a pruned network is a good network , url =. doi:10.48550/arXiv.2402.12479 , abstract =

  68. [68]

    and Barto, Andrew G

    Sutton, Richard S. and Barto, Andrew G. , month = nov, year =. Reinforcement

  69. [69]

    , year = 1980, month = dec, journal =

    On estimating the expected return on the market:. Journal of Financial Economics , author =. 1980 , pages =. doi:10.1016/0304-405X(80)90007-0 , abstract =

  70. [70]

    Annals of Actuarial Science , author =

    Causes of defined benefit pension scheme funding ratio volatility and average contribution rates , volume =. Annals of Actuarial Science , author =. 2012 , keywords =. doi:10.1017/S1748499511000303 , abstract =

  71. [71]

    Mathematical Methods of Operations Research , author =

    Optimal investment for a pension fund under inflation risk , volume =. Mathematical Methods of Operations Research , author =. 2010 , keywords =. doi:10.1007/s00186-009-0294-5 , abstract =

  72. [72]

    Journal of Pension Economics & Finance , author =

    Dynamic allocation decisions in the presence of funding ratio constraints , volume =. Journal of Pension Economics & Finance , author =. 2012 , pages =. doi:10.1017/S1474747212000194 , abstract =

  73. [73]

    and Brandt, Michael W

    van Binsbergen, Jules H. and Brandt, Michael W. , year =. Optimal. Handbook of

  74. [74]

    Sutton , title =

    Learning to predict by the methods of temporal differences , volume =. Machine Learning , author =. 1988 , keywords =. doi:10.1007/BF00115009 , abstract =

  75. [75]

    Watkins, C. J. C. H. , year =. Learning from

  76. [76]

    Playing Atari with Deep Reinforcement Learning

    Playing. NIPS Deep Learning Workshop , author =. 2013 , note =. doi:10.48550/ARXIV.1312.5602 , abstract =

  77. [77]

    Nature 518(7540):529–533

    Human-level control through deep reinforcement learning , volume =. Nature , author =. 2015 , note =. doi:10.1038/nature14236 , abstract =

  78. [78]
  79. [79]

    Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey , month = aug, year =. Soft. doi:10.48550/arXiv.1801.01290 , abstract =

  80. [80]

    Zheng, Cong and He, Jiafa and Yang, Can , month = jun, year =. Optimal. doi:10.48550/arXiv.2306.17178 , abstract =

Showing first 80 references.