AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

Fengxiang He; Hang Yu; Jeff Z. Pan; Tongliang Liu; Zhiyong Wang; Zifan Zheng

arxiv: 2606.20625 · v1 · pith:VAW4TPXRnew · submitted 2026-05-26 · 💻 cs.AI · cs.CL· cs.LG

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

Hang Yu , Zifan Zheng , Jeff Z. Pan , Tongliang Liu , Zhiyong Wang , Fengxiang He This is my paper

Pith reviewed 2026-06-29 16:52 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords alpha miningLLM agentssearch process memoryAST differencesfinancial factorsself-evolving agentsquantitative financefactor discovery

0 comments

The pith

AlphaMemo improves LLM alpha mining by storing reusable edit motifs from search processes via AST differences and gated residual memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes AlphaMemo, a structured memory system for LLM agents that mine alpha factors in finance. Rather than retaining only successful final factors, the system records patterns of edits that succeed or fail under given parent contexts. These patterns are captured as differences in abstract syntax trees, then managed through confidence-gated residual updates and asymmetric vetoes to curb redundancy and overfitting amid noisy market feedback. Experiments demonstrate gains in out-of-sample returns on CSI 500 and S&P 500 plus faster discovery under fixed search budgets. A reader would care because the approach targets the core bottlenecks of combinatorial explosion and non-stationary signals that limit automated quantitative strategy generation.

Core claim

AlphaMemo records reusable evidence about which edit motifs work or fail under specific parent-factor contexts. It extracts motifs from Abstract Syntax Tree (AST) differences, applies confidence-gated residual memory on top of a search-ledger prior, and uses asymmetric veto control to suppress high-confidence failure patterns. Experiments on CSI 500 and S&P 500 show improved out-of-sample performance and fixed-budget discovery efficiency, with ablations validating the roles of residual learning, confidence gating, AST-diff motifs, and veto memory.

What carries the argument

Structured Search-Process Memory that extracts motifs from AST differences and applies confidence-gated residual memory with asymmetric veto control on top of a search-ledger prior.

If this is right

Higher out-of-sample performance on CSI 500 and S&P 500 stock indices.
Greater factor discovery efficiency when search is constrained to a fixed budget.
Each component (residual learning, confidence gating, AST-diff motifs, veto memory) contributes measurably to the gains, as shown by ablations.
Reduced redundancy and lower overfitting risk compared with agents that reuse past successes without process-level memory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same motif-extraction approach could be tested in other LLM-driven combinatorial search domains such as program synthesis or neural architecture search.
Confidence gating may offer a general mechanism for handling non-stationary feedback in reinforcement learning or evolutionary optimization outside finance.
Extending the memory to include cross-factor interactions or multi-asset contexts would be a direct next measurement of scalability.

Load-bearing premise

That motifs extracted from AST differences and the associated confidence-gated memory will generalize across unseen market regimes without introducing fresh selection biases in a non-stationary search process.

What would settle it

Applying the full AlphaMemo agent and a memory-ablated baseline to a later market period outside the original train-validation-test splits and finding no performance or efficiency advantage for the memory-equipped version.

Figures

Figures reproduced from arXiv: 2606.20625 by Fengxiang He, Hang Yu, Jeff Z. Pan, Tongliang Liu, Zhiyong Wang, Zifan Zheng.

**Figure 2.** Figure 2: Algorithmic flow of AlphaMemo. At each iteration, AlphaMemo scores parent-edit actions using a search-ledger prior and confidence-gated SSPM residuals, applies asymmetric vetoes to unreliable actions, generates child factors with the LLM, and updates the ledger and memory with evaluation feedback. (z(p), m). The action score is: At(p, m) = log(Sledger(p) + ϵ) + λtct(z(p), m)∆t(z(p), m). (5) Here ∆t(z, m) … view at source ↗

**Figure 3.** Figure 3: Memory calibration across AlphaMemo operating points and internal diagnostics. Search-ledger only [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

LLM agents are promising for alpha mining via combining financial priors, symbolic reasoning, executable factor generation, and feedback-driven refinement. Yet, they face a combinatorial search space, noisy non-stationary feedback, redundant discoveries, and overfitting risks from naively reusing past successes. To address these challenges, we propose AlphaMemo, a self-evolving alpha mining agent with Structured Search-Process Memory. Rather than memorizing only final factors or full trajectories, AlphaMemo records reusable evidence about which edit motifs work or fail under specific parent-factor contexts. It extracts motifs from Abstract Syntax Tree (AST) differences, applies confidence-gated residual memory on top of a search-ledger prior, and uses asymmetric veto control to suppress high-confidence failure patterns. Experiments on CSI 500 and S\&P 500 show improved out-of-sample performance and fixed-budget discovery efficiency, with ablations validating the roles of residual learning, confidence gating, AST-diff motifs, and veto memory. Code is at https://github.com/jarrettyu/AlphaMemo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AlphaMemo gives a practical memory layer for LLM alpha miners via AST motifs and vetoes, but the non-stationary finance setting leaves generalization untested.

read the letter

AlphaMemo introduces a memory system for LLM agents that mines alpha factors by tracking reusable edit motifs extracted from AST differences, layered with confidence-gated residuals on a search ledger and asymmetric vetoes to block repeated failures.

This is new in its focus on parent-factor contexts and motif-level reuse rather than full trajectories or final factors. The paper does well by releasing code on GitHub, running ablations that isolate each component, and reporting gains in out-of-sample performance plus fixed-budget discovery efficiency on CSI 500 and S&P 500.

The soft spot is generalization. Financial feedback and the search process itself are non-stationary, so motifs that helped in one window can turn stale or introduce bias later. The stress-test concern holds: the experiments do not appear to include strong forward-chronological or cross-regime hold-outs that would show whether the memory structure itself overfits the observed search trajectory. Claims rest on empirical results without visible error bars or full statistical detail in the abstract.

This is for people working on agentic systems in quantitative finance. A reader looking for concrete memory designs in search agents would get value from the implementation choices. It deserves a serious referee because it ships working code, targets a real bottleneck, and supplies ablations, even though robustness questions would need attention in review.

Referee Report

3 major / 2 minor

Summary. The paper proposes AlphaMemo, an LLM-based self-evolving agent for alpha factor mining that augments search with structured process memory. Instead of storing final factors or full trajectories, it extracts reusable AST-diff motifs from parent-factor contexts, maintains a search-ledger prior augmented by confidence-gated residual memory, and applies asymmetric veto to suppress high-confidence failures. Experiments on CSI 500 and S&P 500 report gains in out-of-sample performance and fixed-budget discovery efficiency; ablations attribute improvements to residual learning, confidence gating, AST motifs, and veto memory. Code is released on GitHub.

Significance. If the empirical claims hold under rigorous validation, the work offers a concrete mechanism for memory and bias control in non-stationary combinatorial search, which is relevant to LLM agents in quantitative finance and symbolic program synthesis. The open-source release and component-wise ablations are strengths that support reproducibility and incremental follow-up.

major comments (3)

[Experiments] Experiments section (and abstract): the headline claim of improved out-of-sample performance and discovery efficiency rests on results whose statistical significance, error bars, number of independent runs, and exact train/test chronological splits are not described. Without these, it is impossible to assess whether the reported gains exceed what would be expected from non-stationary market feedback alone.
[Method, Experiments] Method and Experiments: the central generalization assumption—that AST-diff motifs and confidence-gated residual memory extracted from earlier search trajectories remain beneficial in unseen regimes—is not tested via forward-chronological hold-out or regime-shift experiments. Given the non-stationary nature of both market returns and the evolving search distribution, the ablations only confirm utility inside the observed distribution and do not rule out regime-specific overfitting.
[Method] Method: the precise definition of “AST-diff motifs,” how they are canonicalized, stored in the search-ledger prior, and combined with the residual memory update rule is not formalized (no equations or pseudocode). This makes it difficult to verify that the veto mechanism is asymmetric in the claimed way or that it does not inadvertently suppress useful novelty.

minor comments (2)

[Abstract, Introduction] Abstract and introduction: dataset details (exact time periods, rebalancing frequency, transaction-cost assumptions) and performance metrics (Sharpe, IC, turnover, etc.) should be stated explicitly rather than left to the GitHub repository.
[Method] Notation: the terms “search-ledger prior,” “confidence-gated residual memory,” and “asymmetric veto control” are introduced without a compact mathematical or algorithmic definition; a small table or figure summarizing the memory update rules would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving clarity and rigor. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Experiments] Experiments section (and abstract): the headline claim of improved out-of-sample performance and discovery efficiency rests on results whose statistical significance, error bars, number of independent runs, and exact train/test chronological splits are not described. Without these, it is impossible to assess whether the reported gains exceed what would be expected from non-stationary market feedback alone.

Authors: We agree that these experimental details are essential for evaluating robustness. In the revised manuscript we will expand the Experiments section to report the number of independent runs performed, include error bars (standard deviation across runs), specify the exact chronological train/test splits used for both CSI 500 and S&P 500, and add statistical significance tests comparing against baselines. revision: yes
Referee: [Method, Experiments] Method and Experiments: the central generalization assumption—that AST-diff motifs and confidence-gated residual memory extracted from earlier search trajectories remain beneficial in unseen regimes—is not tested via forward-chronological hold-out or regime-shift experiments. Given the non-stationary nature of both market returns and the evolving search distribution, the ablations only confirm utility inside the observed distribution and do not rule out regime-specific overfitting.

Authors: The current experiments employ forward-chronological splits across two distinct markets, which provides a basic test of temporal generalization. However, we did not conduct dedicated regime-shift experiments (e.g., across bull/bear or high/low volatility periods). We will revise the text to explicitly describe the chronological splits and acknowledge the limitation regarding regime-specific overfitting; additional regime-shift experiments are noted as future work due to computational cost. revision: partial
Referee: [Method] Method: the precise definition of “AST-diff motifs,” how they are canonicalized, stored in the search-ledger prior, and combined with the residual memory update rule is not formalized (no equations or pseudocode). This makes it difficult to verify that the veto mechanism is asymmetric in the claimed way or that it does not inadvertently suppress useful novelty.

Authors: We agree that formalization is needed for verifiability. The revised manuscript will include a new subsection with mathematical definitions for AST-diff motif extraction and canonicalization, the search-ledger prior, the confidence-gated residual update rule, and the asymmetric veto logic, accompanied by pseudocode for the memory integration process. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method validated by experiments, no derivations or self-referential predictions.

full rationale

The paper describes an agent architecture (AST-diff motifs, confidence-gated residual memory, asymmetric veto) and reports empirical results on CSI 500 and S&P 500 datasets plus ablations. No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs or self-citations by construction. All performance claims are framed as observed outcomes from fixed-budget search experiments rather than analytic results that presuppose their own validity. Self-citations, if any, are not load-bearing for any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the memory components are described at a high level without mathematical formulation or explicit assumptions.

pith-pipeline@v0.9.1-grok · 5724 in / 1150 out tokens · 32676 ms · 2026-06-29T16:52:17.351655+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 30 canonical work pages · 13 internal anchors

[1]

Proceedings of the 41st International Conference on Machine Learning , pages=

Language agent tree search unifies reasoning, acting, and planning in language models , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
[2]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory , author=. arXiv preprint arXiv:2511.20857 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Wang et al.Alignment Tipping Process.arXiv:2510.04860, 2025

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails , author=. arXiv preprint arXiv:2510.04860 , year=

work page arXiv
[4]

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems , author=. arXiv preprint arXiv:2508.07407 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence , author=. arXiv preprint arXiv:2507.21046 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

Evolver: Self-evolving llm agents through an experience-driven lifecycle , author=. arXiv preprint arXiv:2510.16079 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Agentgym: Evaluating and training large language model-based agents across diverse environments , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[8]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alphaevolve: A coding agent for scientific and algorithmic discovery , author=. arXiv preprint arXiv:2506.13131 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Your Agent May Misevolve: Emergent Risks in Self-evolving

Shuai Shao and Qihan Ren and Dongrui Liu and Chen Qian and Boyi Wei and Dadi Guo and Yang JingYi and Xinhao Song and Linfeng Zhang and Weinan Zhang and Jing Shao , booktitle=. Your Agent May Misevolve: Emergent Risks in Self-evolving
[10]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=
[11]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Contextual experience replay for self-improvement of language agents , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[12]

arXiv preprint arXiv:2509.24704 , year=

Memgen: Weaving generative latent memory for self-evolving agents , author=. arXiv preprint arXiv:2509.24704 , year=

work page arXiv
[13]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning , author=. arXiv preprint arXiv:2508.19828 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Agentic memory: Learning unified long-term and short-term memory management for large language model agents , author=. arXiv preprint arXiv:2601.01885 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Lifelongagentbench: Evaluating llm agents as lifelong learners.arXiv preprint arXiv:2505.11942, 2025

Lifelongagentbench: Evaluating llm agents as lifelong learners , author=. arXiv preprint arXiv:2505.11942 , year=

work page arXiv
[16]

arXiv preprint arXiv:2510.04206 , year=

Agentrl: Scaling agentic reinforcement learning with a multi-turn, multi-task framework , author=. arXiv preprint arXiv:2510.04206 , year=

work page arXiv
[17]

ICML 2025 Workshop on Computer Use Agents , year=

Reinforcing multi-turn reasoning in llm agents via turn-level credit assignment , author=. ICML 2025 Workshop on Computer Use Agents , year=

2025
[18]

arXiv preprint arXiv:2410.10739 , year=

Balancing continuous pre-training and instruction fine-tuning: Optimizing instruction-following in llms , author=. arXiv preprint arXiv:2410.10739 , year=

work page arXiv
[19]

2024 , booktitle =

Zhou, Andy and Yan, Kai and Shlapentokh-Rothman, Michal and Wang, Haohan and Wang, Yu-Xiong , title =. 2024 , booktitle =

2024
[20]

International Conference on Learning Representations , volume=

Swe-search: Enhancing software agents with monte carlo tree search and iterative refinement , author=. International Conference on Learning Representations , volume=
[21]

International Conference on Learning Representations , volume=

Strategist: Self-improvement of LLM decision making via bi-level tree search , author=. International Conference on Learning Representations , volume=
[22]

arXiv preprint arXiv:2411.11053 , year=

Sra-mcts: Self-driven reasoning augmentation with monte carlo tree search for code generation , author=. arXiv preprint arXiv:2411.11053 , year=

work page arXiv
[23]

arXiv preprint arXiv:2501.18922 , year=

Kbqa-o1: Agentic knowledge base question answering with monte carlo tree search , author=. arXiv preprint arXiv:2501.18922 , year=

work page arXiv
[24]

arXiv preprint arXiv:2501.08603 , year=

Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design , author=. arXiv preprint arXiv:2501.08603 , year=

work page arXiv
[25]

arXiv preprint arXiv:2502.06813 , year=

Policy guided tree search for enhanced llm reasoning , author=. arXiv preprint arXiv:2502.06813 , year=

work page arXiv
[26]

2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE) , pages=

An empirical study of financial factor mining based on gene expression programming , author=. 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE) , pages=. 2021 , organization=

2021
[27]

Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining , pages=

Generating synergistic formulaic alpha collections via reinforcement learning , author=. Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining , pages=
[28]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=

Alpha-gpt: Human-ai interactive alpha mining for quantitative investment , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=

2025
[29]

Automate Strategy Finding with LLM in Quant Investment

Kou, Zhizhuo and Yu, Holam and Luo, Junyu and Peng, Jingshu and Li, Xujia and Liu, Chengzhong and Dai, Juntao and Chen, Lei and Han, Sirui and Guo, Yike. Automate Strategy Finding with LLM in Quant Investment. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025

2025
[30]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Can large language models mine interpretable financial factors more effectively? a neural-symbolic factor mining agent model , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024
[31]

Proceedings of the AAAI conference on artificial intelligence , volume=

Alphaforge: A framework to mine and dynamically combine formulaic alpha factors , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[32]

Forty-second International Conference on Machine Learning , year=

Alphaqcm: Alpha discovery in finance with distributional reinforcement learning , author=. Forty-second International Conference on Machine Learning , year=
[33]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

AlphaAgent: LLM-driven alpha mining with regularized exploration to counteract alpha decay , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=
[34]

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration , author=. arXiv preprint arXiv:2509.25055 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Advances in Neural Information Processing Systems , volume=

R&D-Agent-Quant: a multi-agent framework for data-centric factors and model joint optimization , author=. Advances in Neural Information Processing Systems , volume=
[36]

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution , author=. arXiv preprint arXiv:2511.18850 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

International Conference on Learning Representations , year=

AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning , author=. International Conference on Learning Representations , year=
[38]

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Quantaalpha: An evolutionary framework for llm-driven alpha mining , author=. arXiv preprint arXiv:2602.07085 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

arXiv preprint arXiv:2602.14670 , year=

FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery , author=. arXiv preprint arXiv:2602.14670 , year=

work page arXiv
[40]

arXiv preprint arXiv:2602.11917 , year=

AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution , author=. arXiv preprint arXiv:2602.11917 , year=

work page arXiv
[41]

Wilmott , volume=

101 formulaic alphas , author=. Wilmott , volume=. 2016 , publisher=

2016
[42]

The Review of Financial Studies , volume=

Empirical asset pricing via machine learning , author=. The Review of Financial Studies , volume=. 2020 , publisher=

2020
[43]

Journal of financial economics , volume=

Common risk factors in the returns on stocks and bonds , author=. Journal of financial economics , volume=. 1993 , publisher=

1993
[44]

The Review of financial studies , volume=

… and the cross-section of expected returns , author=. The Review of financial studies , volume=. 2016 , publisher=

2016
[45]

arXiv preprint arXiv:2009.11189 , year=

Qlib: An ai-oriented quantitative investment platform , author=. arXiv preprint arXiv:2009.11189 , year=

work page arXiv 2009
[46]

LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , booktitle =. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =
[47]

Long Short-Term Memory , year =

Hochreiter, Sepp and Schmidhuber, J\". Long Short-Term Memory , year =. Neural Comput. , month = nov, pages =
[48]

arXiv preprint arXiv:2002.08245 , year=

Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment , author=. arXiv preprint arXiv:2002.08245 , year=

work page arXiv 2002
[49]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence , pages=

AlphaGAT: A Two-Stage Learning Approach for Adaptive Portfolio Selection , author=. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence , pages=
[50]

IEEE Transactions on Signal Processing , year=

QuantFactor REINFORCE: mining steady formulaic alpha factors with variance-bounded REINFORCE , author=. IEEE Transactions on Signal Processing , year=
[51]

arXiv preprint arXiv:2512.23515 , year=

Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning , author=. arXiv preprint arXiv:2512.23515 , year=

work page arXiv
[52]

arXiv preprint arXiv:2507.17211 , year=

EFS: Evolutionary Factor Searching for Sparse Portfolio Optimization Using Large Language Models , author=. arXiv preprint arXiv:2507.17211 , year=

work page arXiv
[53]

arXiv preprint arXiv:2505.11122 , year=

Navigating the alpha jungle: an LLM-Powered MCTS framework for formulaic factor mining , author=. arXiv preprint arXiv:2505.11122 , year=

work page arXiv
[54]

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

Alphaeval: A comprehensive and efficient evaluation framework for formula alpha mining , author=. arXiv preprint arXiv:2508.13174 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[55]

International Conference on Learning Representations , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. International Conference on Learning Representations , year=
[56]

, title =

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. 2023 , booktitle =

2023
[57]

MemGPT: Towards LLMs as Operating Systems

MemGPT: Towards LLMs as Operating Systems , author=. arXiv preprint arXiv:2310.08560 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Nature , volume=

Mathematical discoveries from program search with large language models , author=. Nature , volume=. 2024 , publisher=

2024
[60]

International Conference on Learning Representations , volume=

Connecting large language models with evolutionary algorithms yields powerful prompt optimizers , author=. International Conference on Learning Representations , volume=
[61]

arXiv preprint arXiv:2501.09891 , year=

Evolving deeper llm thinking , author=. arXiv preprint arXiv:2501.09891 , year=

work page arXiv

[1] [1]

Proceedings of the 41st International Conference on Machine Learning , pages=

Language agent tree search unifies reasoning, acting, and planning in language models , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

[2] [2]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory , author=. arXiv preprint arXiv:2511.20857 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Wang et al.Alignment Tipping Process.arXiv:2510.04860, 2025

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails , author=. arXiv preprint arXiv:2510.04860 , year=

work page arXiv

[4] [4]

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems , author=. arXiv preprint arXiv:2508.07407 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence , author=. arXiv preprint arXiv:2507.21046 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

Evolver: Self-evolving llm agents through an experience-driven lifecycle , author=. arXiv preprint arXiv:2510.16079 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Agentgym: Evaluating and training large language model-based agents across diverse environments , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[8] [8]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alphaevolve: A coding agent for scientific and algorithmic discovery , author=. arXiv preprint arXiv:2506.13131 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Your Agent May Misevolve: Emergent Risks in Self-evolving

Shuai Shao and Qihan Ren and Dongrui Liu and Chen Qian and Boyi Wei and Dadi Guo and Yang JingYi and Xinhao Song and Linfeng Zhang and Weinan Zhang and Jing Shao , booktitle=. Your Agent May Misevolve: Emergent Risks in Self-evolving

[10] [10]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

[11] [11]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Contextual experience replay for self-improvement of language agents , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[12] [12]

arXiv preprint arXiv:2509.24704 , year=

Memgen: Weaving generative latent memory for self-evolving agents , author=. arXiv preprint arXiv:2509.24704 , year=

work page arXiv

[13] [13]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning , author=. arXiv preprint arXiv:2508.19828 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Agentic memory: Learning unified long-term and short-term memory management for large language model agents , author=. arXiv preprint arXiv:2601.01885 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Lifelongagentbench: Evaluating llm agents as lifelong learners.arXiv preprint arXiv:2505.11942, 2025

Lifelongagentbench: Evaluating llm agents as lifelong learners , author=. arXiv preprint arXiv:2505.11942 , year=

work page arXiv

[16] [16]

arXiv preprint arXiv:2510.04206 , year=

Agentrl: Scaling agentic reinforcement learning with a multi-turn, multi-task framework , author=. arXiv preprint arXiv:2510.04206 , year=

work page arXiv

[17] [17]

ICML 2025 Workshop on Computer Use Agents , year=

Reinforcing multi-turn reasoning in llm agents via turn-level credit assignment , author=. ICML 2025 Workshop on Computer Use Agents , year=

2025

[18] [18]

arXiv preprint arXiv:2410.10739 , year=

Balancing continuous pre-training and instruction fine-tuning: Optimizing instruction-following in llms , author=. arXiv preprint arXiv:2410.10739 , year=

work page arXiv

[19] [19]

2024 , booktitle =

Zhou, Andy and Yan, Kai and Shlapentokh-Rothman, Michal and Wang, Haohan and Wang, Yu-Xiong , title =. 2024 , booktitle =

2024

[20] [20]

International Conference on Learning Representations , volume=

Swe-search: Enhancing software agents with monte carlo tree search and iterative refinement , author=. International Conference on Learning Representations , volume=

[21] [21]

International Conference on Learning Representations , volume=

Strategist: Self-improvement of LLM decision making via bi-level tree search , author=. International Conference on Learning Representations , volume=

[22] [22]

arXiv preprint arXiv:2411.11053 , year=

Sra-mcts: Self-driven reasoning augmentation with monte carlo tree search for code generation , author=. arXiv preprint arXiv:2411.11053 , year=

work page arXiv

[23] [23]

arXiv preprint arXiv:2501.18922 , year=

Kbqa-o1: Agentic knowledge base question answering with monte carlo tree search , author=. arXiv preprint arXiv:2501.18922 , year=

work page arXiv

[24] [24]

arXiv preprint arXiv:2501.08603 , year=

Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design , author=. arXiv preprint arXiv:2501.08603 , year=

work page arXiv

[25] [25]

arXiv preprint arXiv:2502.06813 , year=

Policy guided tree search for enhanced llm reasoning , author=. arXiv preprint arXiv:2502.06813 , year=

work page arXiv

[26] [26]

2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE) , pages=

An empirical study of financial factor mining based on gene expression programming , author=. 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE) , pages=. 2021 , organization=

2021

[27] [27]

Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining , pages=

Generating synergistic formulaic alpha collections via reinforcement learning , author=. Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining , pages=

[28] [28]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=

Alpha-gpt: Human-ai interactive alpha mining for quantitative investment , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=

2025

[29] [29]

Automate Strategy Finding with LLM in Quant Investment

Kou, Zhizhuo and Yu, Holam and Luo, Junyu and Peng, Jingshu and Li, Xujia and Liu, Chengzhong and Dai, Juntao and Chen, Lei and Han, Sirui and Guo, Yike. Automate Strategy Finding with LLM in Quant Investment. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025

2025

[30] [30]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Can large language models mine interpretable financial factors more effectively? a neural-symbolic factor mining agent model , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024

[31] [31]

Proceedings of the AAAI conference on artificial intelligence , volume=

Alphaforge: A framework to mine and dynamically combine formulaic alpha factors , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[32] [32]

Forty-second International Conference on Machine Learning , year=

Alphaqcm: Alpha discovery in finance with distributional reinforcement learning , author=. Forty-second International Conference on Machine Learning , year=

[33] [33]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

AlphaAgent: LLM-driven alpha mining with regularized exploration to counteract alpha decay , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=

[34] [34]

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration , author=. arXiv preprint arXiv:2509.25055 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

Advances in Neural Information Processing Systems , volume=

R&D-Agent-Quant: a multi-agent framework for data-centric factors and model joint optimization , author=. Advances in Neural Information Processing Systems , volume=

[36] [36]

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution , author=. arXiv preprint arXiv:2511.18850 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

International Conference on Learning Representations , year=

AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning , author=. International Conference on Learning Representations , year=

[38] [38]

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Quantaalpha: An evolutionary framework for llm-driven alpha mining , author=. arXiv preprint arXiv:2602.07085 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

arXiv preprint arXiv:2602.14670 , year=

FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery , author=. arXiv preprint arXiv:2602.14670 , year=

work page arXiv

[40] [40]

arXiv preprint arXiv:2602.11917 , year=

AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution , author=. arXiv preprint arXiv:2602.11917 , year=

work page arXiv

[41] [41]

Wilmott , volume=

101 formulaic alphas , author=. Wilmott , volume=. 2016 , publisher=

2016

[42] [42]

The Review of Financial Studies , volume=

Empirical asset pricing via machine learning , author=. The Review of Financial Studies , volume=. 2020 , publisher=

2020

[43] [43]

Journal of financial economics , volume=

Common risk factors in the returns on stocks and bonds , author=. Journal of financial economics , volume=. 1993 , publisher=

1993

[44] [44]

The Review of financial studies , volume=

… and the cross-section of expected returns , author=. The Review of financial studies , volume=. 2016 , publisher=

2016

[45] [45]

arXiv preprint arXiv:2009.11189 , year=

Qlib: An ai-oriented quantitative investment platform , author=. arXiv preprint arXiv:2009.11189 , year=

work page arXiv 2009

[46] [46]

LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , booktitle =. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

[47] [47]

Long Short-Term Memory , year =

Hochreiter, Sepp and Schmidhuber, J\". Long Short-Term Memory , year =. Neural Comput. , month = nov, pages =

[48] [48]

arXiv preprint arXiv:2002.08245 , year=

Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment , author=. arXiv preprint arXiv:2002.08245 , year=

work page arXiv 2002

[49] [49]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence , pages=

AlphaGAT: A Two-Stage Learning Approach for Adaptive Portfolio Selection , author=. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence , pages=

[50] [50]

IEEE Transactions on Signal Processing , year=

QuantFactor REINFORCE: mining steady formulaic alpha factors with variance-bounded REINFORCE , author=. IEEE Transactions on Signal Processing , year=

[51] [51]

arXiv preprint arXiv:2512.23515 , year=

Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning , author=. arXiv preprint arXiv:2512.23515 , year=

work page arXiv

[52] [52]

arXiv preprint arXiv:2507.17211 , year=

EFS: Evolutionary Factor Searching for Sparse Portfolio Optimization Using Large Language Models , author=. arXiv preprint arXiv:2507.17211 , year=

work page arXiv

[53] [53]

arXiv preprint arXiv:2505.11122 , year=

Navigating the alpha jungle: an LLM-Powered MCTS framework for formulaic factor mining , author=. arXiv preprint arXiv:2505.11122 , year=

work page arXiv

[54] [54]

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

Alphaeval: A comprehensive and efficient evaluation framework for formula alpha mining , author=. arXiv preprint arXiv:2508.13174 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[55] [55]

International Conference on Learning Representations , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. International Conference on Learning Representations , year=

[56] [56]

, title =

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. 2023 , booktitle =

2023

[57] [57]

MemGPT: Towards LLMs as Operating Systems

MemGPT: Towards LLMs as Operating Systems , author=. arXiv preprint arXiv:2310.08560 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[58] [58]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[59] [59]

Nature , volume=

Mathematical discoveries from program search with large language models , author=. Nature , volume=. 2024 , publisher=

2024

[60] [60]

International Conference on Learning Representations , volume=

Connecting large language models with evolutionary algorithms yields powerful prompt optimizers , author=. International Conference on Learning Representations , volume=

[61] [61]

arXiv preprint arXiv:2501.09891 , year=

Evolving deeper llm thinking , author=. arXiv preprint arXiv:2501.09891 , year=

work page arXiv