Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks

Fabian Hoppe; Melven R\"ohrig-Z\"ollner; Philipp Knechtges

arxiv: 2606.01975 · v1 · pith:O6M74BXGnew · submitted 2026-06-01 · 💻 cs.AI · cs.SE

Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks

Fabian Hoppe , Melven R\"ohrig-Z\"ollner , Philipp Knechtges This is my paper

Pith reviewed 2026-06-28 14:33 UTC · model grok-4.3

classification 💻 cs.AI cs.SE

keywords tensor networkscontraction order optimizationLLMalgorithm developmentevolutionary coding agentsverifier-guidedOpenEvolve

0 comments

The pith

Verifier-guided LLM agents show promise for developing better tensor contraction algorithms while human validation stays essential

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the application of large language models to algorithm development and improvement through a case study focused on contraction order optimization for tensor networks. It deploys verifier-guided evolutionary coding agents with OpenEvolve and tests the effects of different LLMs, evaluation metrics, and test instances on the generated solutions. The work establishes that these agents can produce algorithmic improvements for the contraction task. At the same time it shows that human scientists must continue to perform evaluation, validation, and interpretation of the outputs. The case study therefore illustrates both the capabilities and the current limits of LLM assistance in creating scientific algorithms.

Core claim

Verifier-guided evolutionary coding agents that use LLMs can develop and improve algorithms for contraction order optimization in tensor networks, yet the process still requires human evaluation, validation, and interpretation to ensure the results are reliable and meaningful.

What carries the argument

Verifier-guided evolutionary coding agents that iteratively propose, test, and refine code for tensor-network contraction ordering

If this is right

Design choices for the evaluation metric and test instances directly influence the quality of algorithms generated by the agents
The same verifier-guided approach can be used to attempt algorithmic improvements on other tensor-network related tasks
Human oversight remains necessary to interpret agent outputs and confirm they solve the intended scientific problem

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to contraction optimization in other scientific computing domains that rely on similar ordering problems
Results may change if different LLMs or verification procedures are substituted in the evolutionary loop
Longer-term use might require new benchmarks that better capture real-world tensor network performance beyond the study instances

Load-bearing premise

That conclusions drawn from this single tensor-network case study with its chosen metrics, test instances, and LLM will generalize to algorithmic development tasks in other domains

What would settle it

An experiment in which the contraction-order algorithms produced by the LLM agents are shown to be inferior to established human-written methods when measured on a broader collection of tensor networks outside the original test set

Figures

Figures reproduced from arXiv: 2606.01975 by Fabian Hoppe, Melven R\"ohrig-Z\"ollner, Philipp Knechtges.

**Figure 1.** Figure 1: Examples for tensor networks Consequently, contraction orders for TNs are often determined using heuristics or metaheuristics, or by dedicated algorithms for restricted families of networks. We restrict ourselves to a very concise overview over some common approaches: • Greedy and random-greedy heuristics are fast and simple, and can be surprisingly effective on many instances [SG18, GK21] • Optimization-… view at source ↗

**Figure 2.** Figure 2: Evolution of the average reduction of log10 FLOPs for various LLMs on the “full small” TNs set. Name by size (active) release deployment GPT-OSS-120B OpenAI (US) 117B (5.1B) 08/2025 ChatAI/Blablador GPT-OSS-20B 21B (3.6B) 08/2025 self-hosted (vllm) Qwen3-235B-A22B-Instruct Alibaba (CN) 235B (22B) 04/2025 ChatAI/Blablador Qwen3-30B-A3B-Instruct 30.5B (3.3B) 04/2025 ChatAI Qwen3-30B-A3B-Thinking 30.5B (3.3B)… view at source ↗

**Figure 3.** Figure 3: Evolution of the average reduction of log10 FLOPs on the “reduced small” TNs set over 20 runs with different random seeds. The LLM is GPTOSS-20B with different levels of reasoning effort (“low” to “high” from left to right) [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of average reduction of log10 FLOPs on the “reduced small” TNs set over 20 runs with different random seeds after 1000, 500, and 250 iterations, respectively. The LLM is GPT-OSS-20B with different levels of reasoning effort (“low” to “high”). The bar left to the violin plot indicates mean (circle), standard deviation, and median (star) [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Evolution of various metrics (“metrics measured”, columns) over the iterations for five experiments with different metric optimized (“metric optimized”, rows) with GPT-OSS-20B on the “reduced small” TNs set. Light blue dots indicate individual members of the population, whereas black crosses indicated members of the population that just have become a new best solution (w.r.t. the metric optimized). The num… view at source ↗

**Figure 6.** Figure 6: Distribution of log10 FLOPs for the final best solution of four experiments with GPT-OSS-20B (evolution on reduced “small”, “middle”, “large” and “all” TNs sets; columns), evaluated on the full “small” to “large” TNs sets (rows). The gray and blue histograms show the distribution of FLOPs of the initial and the final best code, respectively. The dotted green line indicates the distribution observed for th… view at source ↗

**Figure 7.** Figure 7: Number of lines of codes, percentage of comments, and relative code complexity for the codes leading at times in Experiment 1; cf [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Number of lines of codes, percentage of comments, and relative code complexity for the codes from Experiment 2; cf [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Evolution of various metrics for the best run (GPT-OSS-20B on the “full small” TNs set with reasoning level “high”). Light blue dots indicate individual members of the population, whereas black crosses indicated members of the population that just have become a new best solution. The gray dotted lines indicate the level of the initial code, whereas the green dotted line indicates the threshold for an impro… view at source ↗

**Figure 10.** Figure 10: Distribution of log10 FLOPs for the final code of the best run (blue) vs different baselines (“cotengra cheap”, “cotengra+cmaes”, and “cotengra+optuna” in green, red, and purple, respectively) on different TNs sets (columns). The top row uses the “full” TNs sets, whereas the bottom row uses only the last 100 of the TNs from the respective sets. The numbers in the upper left corners give the average impro… view at source ↗

**Figure 11.** Figure 11: Distribution of runtimes for computing the contraction order with the final code of the best run (blue) vs different baselines (“cotengra cheap”, “cotengra+cmaes”, and “cotengra+optuna” in green, red, and purple, respectively) on different TNs sets (columns). The top row uses the “full” TNs sets, whereas the bottom row uses only the last 100 of the TNs from the respective sets. The numbers in the upper l… view at source ↗

read the original abstract

We consider LLM-based algorithm development through a case study on contractionorder optimisation for tensor networks with OpenEvolve. We pay particular attention to the choice of the LLM as well as design choices such as evaluation metric and test instances. Our results highlight both the promise of verifier-guided evolutionary coding agents for algorithm development/improvement and the continuing importance of evaluation, validation, and interpretation -- and corresponding challenges -- by the human scientist.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Narrow case study on LLM-assisted tensor contraction ordering shows no sign of generalizing beyond this one task.

read the letter

The one thing to know is that this is a case study applying an existing evolutionary coding agent framework to the problem of optimizing contraction orders in tensor networks. It doesn't claim a new method or broad results.

What the paper does well is to carefully consider practical design choices such as which LLM to use and how to define the evaluation metric and test instances. It also makes a point of noting that human scientists are still needed for validation and interpretation, which aligns with the reality of these tools.

The soft spots are more significant. The work is limited to one narrow task. There is no indication that the findings would transfer to other algorithmic problems, as the stress test points out. Without details on how much improvement was achieved compared to standard methods or any error analysis, it's hard to gauge the actual contribution. The abstract mentions results but provides no numbers, which makes it difficult to evaluate the soundness.

The paper engages honestly with the literature on LLM use for coding and doesn't overclaim in the abstract. However, the central argument about promise for algorithm development relies on this single example, which is a weak foundation.

This paper would be of interest to researchers working on tensor networks or those experimenting with LLMs for domain-specific optimizations. A general audience in AI or algorithms would not find much transferable value. It does not seem to merit sending out for peer review because the evidence is too limited to support the conclusions drawn.

Referee Report

2 major / 0 minor

Summary. The paper presents a case study on LLM-based algorithm development using verifier-guided evolutionary coding agents (OpenEvolve) for contraction-order optimization in tensor networks. It examines the impact of LLM choice, evaluation metrics, and test instances, concluding that such agents show promise for algorithmic improvement while underscoring the essential role of human evaluation, validation, and interpretation.

Significance. If the case study demonstrates measurable improvements over baselines with the described setup, the work would illustrate a concrete application of LLMs to a combinatorial optimization task in tensor networks and reinforce the value of hybrid human-AI workflows. However, the single narrow domain limits broader significance for algorithmic development in general without evidence of transfer.

major comments (2)

[Abstract] Abstract: the central claim that verifier-guided evolutionary coding agents show promise for algorithm development/improvement rests on results from one tensor-network case study using one LLM, one set of test instances, and one evaluation metric. No evidence is supplied that the observed improvements or necessity of human interpretation would hold for other problems.
[Abstract] Abstract: no quantitative results, error bars, baseline comparisons, or description of how improvements were measured are supplied, so it is impossible to judge whether the central claim is supported by data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments. We address the major comments point by point below, agreeing where revisions to the abstract are warranted to better reflect the case-study nature of the work and to include quantitative details.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that verifier-guided evolutionary coding agents show promise for algorithm development/improvement rests on results from one tensor-network case study using one LLM, one set of test instances, and one evaluation metric. No evidence is supplied that the observed improvements or necessity of human interpretation would hold for other problems.

Authors: The manuscript is explicitly framed as a case study on contraction-order optimization in tensor networks, as stated in the abstract and introduction. The central claim concerns the observed promise and the essential role of human validation within this specific setting; we do not claim or provide evidence that the results transfer to other algorithmic problems. We will revise the abstract to more clearly emphasize the case-study scope and the absence of broader transfer evidence. revision: yes
Referee: [Abstract] Abstract: no quantitative results, error bars, baseline comparisons, or description of how improvements were measured are supplied, so it is impossible to judge whether the central claim is supported by data.

Authors: The full manuscript contains quantitative results, baseline comparisons (including standard contraction-order heuristics), error bars from multiple runs, and explicit descriptions of the evaluation metric and test instances. However, the abstract does not summarize these elements. We will revise the abstract to incorporate key quantitative highlights and measurement details. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical case study with no derivation chain

full rationale

The paper is a case study reporting experimental results from applying an LLM-based evolutionary coding agent (OpenEvolve) to one specific task: contraction-order optimization for tensor networks. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim that such agents 'show promise' rests on direct empirical observations rather than any self-referential reduction to inputs by construction. Generalization concerns are validity issues, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or postulated entities; ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5605 in / 937 out tokens · 22831 ms · 2026-06-28T14:33:15.377721+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 34 canonical work pages · 5 internal anchors

[1]

Agrawal, O

doi:10.18653/v1/2025.eval4nlp-1.12 [AAGa24] E. Agrawal, O. Alam, C. Goenka, et al. Code Compass: A Study on the Challenges of Navigating Unfamiliar Codebases.CoRRabs/2405.06271,

work page doi:10.18653/v1/2025.eval4nlp-1.12 2025
[2]

Assump¸ c˜ ao, D

doi:10.48550/ARXIV.2405.06271 [AFCa25] H. Assump¸ c˜ ao, D. Ferreira, L. Campos, et al. CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization

work page doi:10.48550/arxiv.2405.06271
[3]

CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization

doi:10.48550/arXiv.2510.14150 [AKSa24] V. Aglietti, I. Ktena, J. Schrouff, et al. FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.14150
[4]

https://arxiv.org/abs/2406.04824 [ATSa26] L. A. Agrawal, S. Tan, D. Soylu, et al. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

arXiv
[5]

https://arxiv.org/abs/2507.19457 8In fact, a tiny and hidden approach to this has already happened when we chose 2 f in the definition of combined score based on a few experiments as indicated at the beginning of Sect. 3 22 F. HOPPE, M. R ¨OHRIG-Z ¨OLLNER, P. KNECHTGES [BB17] J. Biamonte, V. Bergholm. Tensor Networks in a Nutshell

Pith/arXiv arXiv
[6]

B¨ auerle, A

https://arxiv.org/abs/1708.00006 [BCNa26] A. B¨ auerle, A. Connors, A. Novikov, et al. Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

Pith/arXiv arXiv
[7]

Brown, J

https://arxiv.org/abs/2605.05921 [BHJa25] D. Brown, J. He, H. Jenne, et al. Even with AI, Bijection Discovery is Still Hard: The Opportu- nities and Challenges of OpenEvolve for Novel Bijection Construction

Pith/arXiv arXiv
[8]

Ballard, T

doi:10.48550/arXiv.2511.20987 [BK25] G. Ballard, T. G. Kolda.Tensor Decompositions for Data Science. Cambridge University Press, June

work page doi:10.48550/arxiv.2511.20987
[9]

Beel, M.-Y

doi:10.1017/9781009471664 [BKB25] J. Beel, M.-Y. Kan, M. Baumgart. Evaluating Sakana’s AI Scientist: Bold Claims, Mixed Results, and a Promising Future?SIGIR Forum59(1):1–20, Oct

work page doi:10.1017/9781009471664
[10]

doi:10.1145/3769733.3769747 [BNL26] J. Bhan, N. Nobili, P. Langer. New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search

work page doi:10.1145/3769733.3769747
[11]

Caravaca, ´Angel Cuevas, R

https://arxiv.org/abs/2605.01120 [CCC25] F. Caravaca, ´Angel Cuevas, R. Cuevas. From Prompts to Power: Measuring the Energy Footprint of LLM Inference

Pith/arXiv arXiv
[12]

Cheng, S

https://arxiv.org/abs/2511.05597 [CLPa25] A. Cheng, S. Liu, M. Pan, et al. Let the Barbarians In: How AI Can Accelerate Systems Perfor- mance Research

arXiv
[13]

Cheng, L

doi:10.48550/arXiv.2512.14806 [CZH26] A. Cheng, L. Zhang, G. He. Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

work page doi:10.48550/arxiv.2512.14806
[14]

https://arxiv.org/abs/2508.20729 [Dec99] R. Dechter. Bucket elimination: A unifying framework for reasoning.Artificial Intelligence113(1– 2):41–85,

arXiv
[15]

doi:10.1016/S0004-3702(99)00059-4 [DFGa18] E. F. Dumitrescu, A. L. Fisher, T. D. Goodrich, et al. Benchmarking treewidth as a practical component of tensor network simulations.PLOS ONE13(12),

work page doi:10.1016/s0004-3702(99)00059-4
[16]

Fernando, D

doi:10.1371/journal.pone.0207827 [FBMa24] C. Fernando, D. Banarse, H. Michalewski, et al. Promptbreeder: self-referential self-improvement via prompt evolution. InProceedings of the 41st International Conference on Machine Learning. ICML’24. JMLR.org,

work page doi:10.1371/journal.pone.0207827
[17]

https://dl.acm.org/doi/10.5555/3692070.3692611 [Fei22] D. G. Feitelson. Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension.Empirical Software Engineering27(6):123, Jun

work page doi:10.5555/3692070.3692611
[18]

Felderer, M

doi:10.1007/s10664-022-10160-3 [FGGa25] M. Felderer, M. Goedicke, L. Grunske, et al. Investigating Research Software Engineering: Toward RSE Research.Commun. ACM68(2):20–23, Jan

work page doi:10.1007/s10664-022-10160-3
[19]

Fisher, V

doi:10.1145/3685265 [FKSa26] D. Fisher, V. Khrulkov, M. Saygin, et al. LLM-Guided Evolutionary Search for Algebraic T- Count Optimization

work page doi:10.1145/3685265
[20]

Feldt, P

https://arxiv.org/abs/2603.29894 [FLFa26] R. Feldt, P. Lenberg, J. Frattini, et al. The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

arXiv
[21]

https://arxiv.org/abs/2604.15468 [FLXa26] R. Fu, Y. Liu, Q. Xu, et al. MappingEvolve: LLM-Driven Code Evolution for Technology Map- ping

Pith/arXiv arXiv
[22]

Goodfellow, Y

https://arxiv.org/abs/2604.26591 [GBC16] I. Goodfellow, Y. Bengio, A. Courville.Deep Learning. MIT Press, 2016.http://www. deeplearningbook.org. ALGORITHMIC ALGORITHM DEVELOPMENT WITH LLMS 23 [GD04] V. Gogate, R. Dechter. A complete anytime algorithm for treewidth. InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04, p....

Pith/arXiv arXiv 2016
[23]

Georgiev, J

https://dl.acm.org/doi/10.5555/1036843.1036868 [GGTa25] B. Georgiev, J. G´ omez-Serrano, T. Tao, et al. Mathematical exploration and discovery at scale

work page doi:10.5555/1036843.1036868
[24]

https://arxiv.org/abs/2511.02864 [GK21] J. Gray, S. Kourtis. Hyper-optimized tensor network contraction.Quantum5:410,

Pith/arXiv arXiv
[25]

doi:10.22331/q-2021-03-15-410 [GRSa25] P. W. Goncalves, P. Rani, M.-A. Storey, et al. Code Review Comprehension: Reviewing Strategies Seen Through Code Comprehension Theories . In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). Pp. 589–601. IEEE Computer Society, Los Alamitos, CA, USA, Apr

work page doi:10.22331/q-2021-03-15-410 2021
[26]

Gottweis, W.-H

doi:10.1109/ICPC66645.2025.00068 [GWDa25] J. Gottweis, W.-H. Weng, A. Daryin, et al. Towards an AI co-scientist

work page doi:10.1109/icpc66645.2025.00068 2025
[27]

Imajuku, K

https://arxiv.org/abs/2502.18864 [IHIa] Y. Imajuku, K. Horie, Y. Iwata, et al. ALE-Bench: A Benchmark for Long-Horizon Objective- Driven Algorithm Engineering. NeurIPS

Pith/arXiv arXiv
[28]

Ibrahim, D

https://arxiv.org/abs/2506.09050 [ILHa22] C. Ibrahim, D. Lykov, Z. He, et al. Constructing Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation

arXiv
[29]

Iacovides, W

https://arxiv.org/abs/2209.02895 [IZLa25] G. Iacovides, W. Zhou, C. Li, et al. Domain-Aware Tensor Network Structure Search

arXiv
[30]

Jiang, F

https://arxiv.org/abs/2505.23537 [JWSa26] J. Jiang, F. Wang, J. Shen, et al. A Survey on Large Language Models for Code Generation. ACM Trans. Softw. Eng. Methodol.35(2), Jan

arXiv
[31]

ACM Trans

doi:10.1145/3747588 [KGBa25] V. Khrulkov, A. Galichin, D. Bashkirov, et al. GigaEvo: An Open Source Optimization Frame- work Powered By LLMs And Evolution Algorithms

work page doi:10.1145/3747588
[32]

https://arxiv.org/abs/2511.17592 [Kjæ90] U. B. Kjærulff. Triangulation of Graphs – Algorithms Giving Small Total State Space. Technical report R 90-09, Aalborg University,

arXiv
[33]

Kumar, A

https://cse.unl.edu/~choueiry/Documents/Kjaerulff-TR-1990.pdf [KSNa26] U. Kumar, A. Saito, H. Niranjani, et al. Evolving Interpretable Constitutions for Multi-Agent Coordination

1990
[34]

Klowden, T

https://arxiv.org/abs/2602.00755 [KT26] T. Klowden, T. Tao. Mathematical methods and human thought in the age of AI

arXiv
[35]

https://arxiv.org/abs/2603.26524 [LGWa26] Z. Liu, X. Guo, X. Wei, et al. Escher-Loop: Mutual Evolution by Closed-Loop Self-Referential Optimization

arXiv
[36]

https://arxiv.org/abs/2604.23472 [LIC25] R. T. Lange, Y. Imajuku, E. Cetin. ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

Pith/arXiv arXiv
[37]

https://arxiv.org/abs/2509.19349 [LLLa26] C. Lu, C. Lu, R. T. Lange, et al. Towards end-to-end automation of AI research.Nature 651(8107):914–919, Mar

Pith/arXiv arXiv
[38]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

See alsohttps://arxiv.org/abs/2408.06292. doi:10.1038/s41586-026-10265-5 [LMSa26] K.-A. Lie, O. Møyner, E. Svee, et al. Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s41586-026-10265-5
[39]

https://arxiv.org/abs/2603.00214 [LTYa24] F. Liu, X. Tong, M. Yuan, et al. Evolution of heuristics: towards efficient automatic algorithm design using large language model. InProceedings of the 41st International Conference on Ma- chine Learning. ICML’24. JMLR.org,

arXiv
[40]

HOPPE, M

https://dl.acm.org/doi/abs/10.5555/3692070.3693374 24 F. HOPPE, M. R ¨OHRIG-Z ¨OLLNER, P. KNECHTGES [LYFa26] H. Lin, H. Ye, W. Feng, et al. Can Language Models Discover Scaling Laws?

work page doi:10.5555/3692070.3693374
[41]

https://arxiv.org/abs/2507.21184 [LZ] X.-Y. Liu, Z. Zhang. Classical Simulation of Quantum Circuits Using Reinforcement Learning: Parallel Environments and Benchmark. InNeurIPS

arXiv
[42]

https://proceedings.neurips.cc/paper_files/paper/2023/file/ d41b70011dd21ec3de5e019302279551-Paper-Datasets_and_Benchmarks.pdf [LZCa25] G. Liu, Y. Zhu, J. Chen, et al. Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research

2023
[43]

https://arxiv.org/abs/2510.06056 [LZX+24] F. Liu, R. Zhang, Z. Xie, R. Sun, K. Li, X. Lin, Z. Wang, Z. Lu, Q. Zhang. LLM4AD: A Platform for Algorithm Design with Large Language Model

arXiv
[44]

https://arxiv.org/abs/2412.17287 [MMMa] E. A. Meirom, H. Maron, S. Mannor, et al. Optimizing Tensor Network Contraction Using Re- inforcement Learning. InProceedings of the 39th International Conference on Machine Learning (ICML 2022). https://proceedings.mlr.press/v162/meirom22a.html [MS08] I. L. Markov, Y. Shi. Simulating Quantum Computation by Contract...

arXiv 2022
[45]

Mitchener, A

doi:10.1137/050644756 [MYCa25] L. Mitchener, A. Yiu, B. Chang, et al. Kosmos: An AI Scientist for Autonomous Discovery

work page doi:10.1137/050644756
[46]

https://arxiv.org/abs/2511.02824 [MZKa26] V. A. Mazin, M. A. Zorin, D. S. Korzh, et al. LLM-Guided Prompt Evolution for Password Guessing

Pith/arXiv arXiv
[47]

Nagaitsev, L

https://arxiv.org/abs/2604.12601 [NGWI25] K. Nagaitsev, L. Grbcic, S. Williams, C. Iancu. Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

Pith/arXiv arXiv
[48]

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

https://arxiv.org/abs/2511.16964 [NRT] D. Neum¨ uller, A. Raschke, M. Tichy. Providing Information About Implemented Algorithms Improves Program Comprehension: A Controlled Experiment. InProceedings of the 29th Inter- national Conference on Evaluation and Assessment in Software Engineering. EASE ’25. doi:10.1145/3756681.3756968 [NVEa25] A. Novikov, N. V˜ ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3756681.3756968
[49]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

https://arxiv.org/abs/2506.13131 [O’G] B. O’Gorman. Parameterization of Tensor Network Contraction. In14th Conference on the The- ory of Quantum Computation, Communication and Cryptography (TQC 2019). doi:10.4230/LIPIcs.TQC.2019.10 [Or´ u14] R. Or´ us. A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States.A...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.4230/lipics.tqc.2019.10 2019
[50]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states,

doi:10.1016/j.aop.2014.06.013 [PA+25] O. Press, B. Amos et al. AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

work page doi:10.1016/j.aop.2014.06.013 2014
[51]

Romera-Paredes, M

https://arxiv.org/abs/2507.15887 [RBNa24] B. Romera-Paredes, M. Barekatain, A. Novikov, et al. Mathematical discoveries from program search with large language models.Nature625:468–475,

arXiv
[52]

Pawan Kumar, Emilien Dupont, Francisco J

doi:10.1038/s41586-023-06924-6 [RBSa25] P. Rajput, A. A. Bonkoungou, Y. Song, et al. Dynamic Stability of LLM-Generated Code

work page doi:10.1038/s41586-023-06924-6
[53]

https://arxiv.org/abs/2511.07463 [RTL76] D. J. Rose, R. E. Tarjan, G. S. Lueker. Algorithmic Aspects of Vertex Elimination on Graphs. SIAM Journal on Computing5(2):266–283,

arXiv
[54]

Staudt, M

doi:10.1137/0205021 [SBK+] C. Staudt, M. Blacher, J. Klaus, F. Lippmann, J. Giesen. Improved Cut Strategy for Tensor Net- work Contraction Orders. In22nd International Symposium on Experimental Algorithms (SEA ALGORITHMIC ALGORITHM DEVELOPMENT WITH LLMS 25 2024). doi:10.4230/LIPIcs.SEA.2024.27 [SG18] D. G. A. Smith, J. Gray. opt einsum - A Python package ...

work page doi:10.1137/0205021 2024
[55]

Schlag, T

doi:10.21105/joss.00753 [SHGa23] S. Schlag, T. Heuer, L. Gottesb¨ uren, et al. High-Quality Hypergraph Partitioning.ACM J. Exp. Algorithmics27, Feb

work page doi:10.21105/joss.00753
[56]

doi:10.1145/3529090 [SISa25] M. L. Siddiq, A. Islam-Gomes, N. Sekerak, et al. Large Language Models for Software Engineering: A Reproducibility Crisis

work page doi:10.1145/3529090
[57]

Schindler, A

https://arxiv.org/abs/2512.00651 [SJ20] F. Schindler, A. S. Jermyn. Algorithms for tensor network contraction ordering.Machine Learn- ing: Science and Technology1(3):035001,

arXiv
[58]

Stoian, R

doi:10.1088/2632-2153/ab94c5 [SMM24] M. Stoian, R. M. Milbradt, C. B. Mendl. On the Optimal Linear Contraction Order of Tree Tensor Networks, and Beyond.SIAM Journal on Scientific Computing46(5):B647–B668,

work page doi:10.1088/2632-2153/ab94c5
[59]

ˇSurina, A

doi:10.1137/23M161286X [ˇSMQa25] A. ˇSurina, A. Mansouri, L. C. P. M. Quaedvlieg, et al. Algorithm Discovery With LLMs: Evolu- tionary Search Meets Reinforcement Learning

work page doi:10.1137/23m161286x
[60]

Siegmund, J

https://arxiv.org/abs/2504.05108 [SS15] J. Siegmund, J. Schumann. Confounding parameters on program comprehension: a literature survey.Empirical Software Engineering20(4):1159–1192, Aug

Pith/arXiv arXiv
[61]

Strasser

doi:10.1007/s10664-014-9318-8 [Str17] B. Strasser. Computing Tree Decompositions with FlowCutter: PACE 2017 Submission

work page doi:10.1007/s10664-014-9318-8 2017
[62]

https://arxiv.org/abs/1709.08949 [Tam] H. Tamaki. Positive-instance driven dynamic programming for treewidth. In25th Annual Euro- pean Symposium on Algorithms (ESA 2017). doi:10.4230/LIPIcs.ESA.2017.68 [TDPa26] A. Torri, P. Dominikowski, B. Pointal, et al. Near-Optimal Contraction Strategies for the Scalar Product in the Tensor-Train Format. In Nagel et a...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.4230/lipics.esa.2017.68 2017
[63]

doi:10.1007/978-3-031-99872-0 5 [TGZa24] M. Tian, L. Gao, S. D. Zhang, et al. SciCode: A Research Coding Benchmark Curated by Scientists

work page doi:10.1007/978-3-031-99872-0
[64]

Thach, A

https://arxiv.org/abs/2407.13168 [TRHC25] N. Thach, A. Riahifar, N. Huynh, H. Chan. RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models

arXiv
[65]

Verstraete, D

https://arxiv.org/abs/2505.20242 [VPC04] F. Verstraete, D. Porras, J. I. Cirac. Density Matrix Renormalization Group and Periodic Bound- ary Conditions: A Quantum Information Perspective.Phys. Rev. Lett.93:227205, Nov

arXiv
[66]

doi:10.1103/PhysRevLett.93.227205 [WQBa26] J. Wen, L. Qiu, J. Benton, et al. Automated Weak-to-Strong Researcher

work page doi:10.1103/physrevlett.93.227205
[67]

https://alignment.anthropic.com/2026/automated-w2s-researcher/ [WSZa25] Y

Anthropic Align- ment Science Blog. https://alignment.anthropic.com/2026/automated-w2s-researcher/ [WSZa25] Y. Wang, S.-R. Su, Z. Zeng, et al. ThetaEvolve: Test-time Learning on Open Problems

2026
[68]

https://arxiv.org/abs/2511.23473 [XZLa23] J. Xu, H. Zhang, L. Liang, et al. NP-Hardness of Tensor Network Contraction Ordering

Pith/arXiv arXiv
[69]

https://arxiv.org/abs/2310.06140 [YLDa25] J. Yuan, H. Li, X. Ding, et al. Understanding and Mitigating Numerical Sources of Nondetermin- ism in LLM Inference. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

arXiv
[70]

HOPPE, M

https://arxiv.org/abs/2506.09501 26 F. HOPPE, M. R ¨OHRIG-Z ¨OLLNER, P. KNECHTGES [YLLa25] Y. Yamada, R. T. Lange, C. Lu, et al. The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

arXiv
[71]

https://arxiv.org/abs/2504.08066 [YWCa24] H. Ye, J. Wang, Z. Cao, et al. ReEvo: large language models as hyper-heuristics with reflective evolution. InProceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24. Curran Associates Inc., Red Hook, NY, USA,

Pith/arXiv arXiv
[72]

https://dl.acm.org/doi/10.5555/3737916.3739297 [YZLL24] J. Yang, K. Zhou, Y. Li, Z. Liu. Generalized Out-of-Distribution Detection: A Survey.Interna- tional Journal of Computer Vision132:5635–5662,

work page doi:10.5555/3737916.3739297
[73]

doi:10.1007/s11263-024-02117-4 [ZLSa24] J. Zeng, C. Li, Z. Sun, et al. tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs). InProceedings of the 41st International Confer- ence on Machine Learning

work page doi:10.1007/s11263-024-02117-4
[74]

EVOLVE-BLOCK-START

https://arxiv.org/abs/2602.08253 ALGORITHMIC ALGORITHM DEVELOPMENT WITH LLMS 27 System Message for OpenEvolve You are an expert programmer and expert in tensor networks. Your goal is to evolve and improve the code of the function ‘find_edge_path‘ in between the markers "EVOLVE-BLOCK-START" and "EVOLVE- BLOCK-END". CONTEXT: The function ‘find_edge_path‘ ge...

arXiv

[1] [1]

Agrawal, O

doi:10.18653/v1/2025.eval4nlp-1.12 [AAGa24] E. Agrawal, O. Alam, C. Goenka, et al. Code Compass: A Study on the Challenges of Navigating Unfamiliar Codebases.CoRRabs/2405.06271,

work page doi:10.18653/v1/2025.eval4nlp-1.12 2025

[2] [2]

Assump¸ c˜ ao, D

doi:10.48550/ARXIV.2405.06271 [AFCa25] H. Assump¸ c˜ ao, D. Ferreira, L. Campos, et al. CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization

work page doi:10.48550/arxiv.2405.06271

[3] [3]

CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization

doi:10.48550/arXiv.2510.14150 [AKSa24] V. Aglietti, I. Ktena, J. Schrouff, et al. FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.14150

[4] [4]

https://arxiv.org/abs/2406.04824 [ATSa26] L. A. Agrawal, S. Tan, D. Soylu, et al. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

arXiv

[5] [5]

https://arxiv.org/abs/2507.19457 8In fact, a tiny and hidden approach to this has already happened when we chose 2 f in the definition of combined score based on a few experiments as indicated at the beginning of Sect. 3 22 F. HOPPE, M. R ¨OHRIG-Z ¨OLLNER, P. KNECHTGES [BB17] J. Biamonte, V. Bergholm. Tensor Networks in a Nutshell

Pith/arXiv arXiv

[6] [6]

B¨ auerle, A

https://arxiv.org/abs/1708.00006 [BCNa26] A. B¨ auerle, A. Connors, A. Novikov, et al. Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

Pith/arXiv arXiv

[7] [7]

Brown, J

https://arxiv.org/abs/2605.05921 [BHJa25] D. Brown, J. He, H. Jenne, et al. Even with AI, Bijection Discovery is Still Hard: The Opportu- nities and Challenges of OpenEvolve for Novel Bijection Construction

Pith/arXiv arXiv

[8] [8]

Ballard, T

doi:10.48550/arXiv.2511.20987 [BK25] G. Ballard, T. G. Kolda.Tensor Decompositions for Data Science. Cambridge University Press, June

work page doi:10.48550/arxiv.2511.20987

[9] [9]

Beel, M.-Y

doi:10.1017/9781009471664 [BKB25] J. Beel, M.-Y. Kan, M. Baumgart. Evaluating Sakana’s AI Scientist: Bold Claims, Mixed Results, and a Promising Future?SIGIR Forum59(1):1–20, Oct

work page doi:10.1017/9781009471664

[10] [10]

doi:10.1145/3769733.3769747 [BNL26] J. Bhan, N. Nobili, P. Langer. New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search

work page doi:10.1145/3769733.3769747

[11] [11]

Caravaca, ´Angel Cuevas, R

https://arxiv.org/abs/2605.01120 [CCC25] F. Caravaca, ´Angel Cuevas, R. Cuevas. From Prompts to Power: Measuring the Energy Footprint of LLM Inference

Pith/arXiv arXiv

[12] [12]

Cheng, S

https://arxiv.org/abs/2511.05597 [CLPa25] A. Cheng, S. Liu, M. Pan, et al. Let the Barbarians In: How AI Can Accelerate Systems Perfor- mance Research

arXiv

[13] [13]

Cheng, L

doi:10.48550/arXiv.2512.14806 [CZH26] A. Cheng, L. Zhang, G. He. Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

work page doi:10.48550/arxiv.2512.14806

[14] [14]

https://arxiv.org/abs/2508.20729 [Dec99] R. Dechter. Bucket elimination: A unifying framework for reasoning.Artificial Intelligence113(1– 2):41–85,

arXiv

[15] [15]

doi:10.1016/S0004-3702(99)00059-4 [DFGa18] E. F. Dumitrescu, A. L. Fisher, T. D. Goodrich, et al. Benchmarking treewidth as a practical component of tensor network simulations.PLOS ONE13(12),

work page doi:10.1016/s0004-3702(99)00059-4

[16] [16]

Fernando, D

doi:10.1371/journal.pone.0207827 [FBMa24] C. Fernando, D. Banarse, H. Michalewski, et al. Promptbreeder: self-referential self-improvement via prompt evolution. InProceedings of the 41st International Conference on Machine Learning. ICML’24. JMLR.org,

work page doi:10.1371/journal.pone.0207827

[17] [17]

https://dl.acm.org/doi/10.5555/3692070.3692611 [Fei22] D. G. Feitelson. Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension.Empirical Software Engineering27(6):123, Jun

work page doi:10.5555/3692070.3692611

[18] [18]

Felderer, M

doi:10.1007/s10664-022-10160-3 [FGGa25] M. Felderer, M. Goedicke, L. Grunske, et al. Investigating Research Software Engineering: Toward RSE Research.Commun. ACM68(2):20–23, Jan

work page doi:10.1007/s10664-022-10160-3

[19] [19]

Fisher, V

doi:10.1145/3685265 [FKSa26] D. Fisher, V. Khrulkov, M. Saygin, et al. LLM-Guided Evolutionary Search for Algebraic T- Count Optimization

work page doi:10.1145/3685265

[20] [20]

Feldt, P

https://arxiv.org/abs/2603.29894 [FLFa26] R. Feldt, P. Lenberg, J. Frattini, et al. The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

arXiv

[21] [21]

https://arxiv.org/abs/2604.15468 [FLXa26] R. Fu, Y. Liu, Q. Xu, et al. MappingEvolve: LLM-Driven Code Evolution for Technology Map- ping

Pith/arXiv arXiv

[22] [22]

Goodfellow, Y

https://arxiv.org/abs/2604.26591 [GBC16] I. Goodfellow, Y. Bengio, A. Courville.Deep Learning. MIT Press, 2016.http://www. deeplearningbook.org. ALGORITHMIC ALGORITHM DEVELOPMENT WITH LLMS 23 [GD04] V. Gogate, R. Dechter. A complete anytime algorithm for treewidth. InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04, p....

Pith/arXiv arXiv 2016

[23] [23]

Georgiev, J

https://dl.acm.org/doi/10.5555/1036843.1036868 [GGTa25] B. Georgiev, J. G´ omez-Serrano, T. Tao, et al. Mathematical exploration and discovery at scale

work page doi:10.5555/1036843.1036868

[24] [24]

https://arxiv.org/abs/2511.02864 [GK21] J. Gray, S. Kourtis. Hyper-optimized tensor network contraction.Quantum5:410,

Pith/arXiv arXiv

[25] [25]

doi:10.22331/q-2021-03-15-410 [GRSa25] P. W. Goncalves, P. Rani, M.-A. Storey, et al. Code Review Comprehension: Reviewing Strategies Seen Through Code Comprehension Theories . In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). Pp. 589–601. IEEE Computer Society, Los Alamitos, CA, USA, Apr

work page doi:10.22331/q-2021-03-15-410 2021

[26] [26]

Gottweis, W.-H

doi:10.1109/ICPC66645.2025.00068 [GWDa25] J. Gottweis, W.-H. Weng, A. Daryin, et al. Towards an AI co-scientist

work page doi:10.1109/icpc66645.2025.00068 2025

[27] [27]

Imajuku, K

https://arxiv.org/abs/2502.18864 [IHIa] Y. Imajuku, K. Horie, Y. Iwata, et al. ALE-Bench: A Benchmark for Long-Horizon Objective- Driven Algorithm Engineering. NeurIPS

Pith/arXiv arXiv

[28] [28]

Ibrahim, D

https://arxiv.org/abs/2506.09050 [ILHa22] C. Ibrahim, D. Lykov, Z. He, et al. Constructing Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation

arXiv

[29] [29]

Iacovides, W

https://arxiv.org/abs/2209.02895 [IZLa25] G. Iacovides, W. Zhou, C. Li, et al. Domain-Aware Tensor Network Structure Search

arXiv

[30] [30]

Jiang, F

https://arxiv.org/abs/2505.23537 [JWSa26] J. Jiang, F. Wang, J. Shen, et al. A Survey on Large Language Models for Code Generation. ACM Trans. Softw. Eng. Methodol.35(2), Jan

arXiv

[31] [31]

ACM Trans

doi:10.1145/3747588 [KGBa25] V. Khrulkov, A. Galichin, D. Bashkirov, et al. GigaEvo: An Open Source Optimization Frame- work Powered By LLMs And Evolution Algorithms

work page doi:10.1145/3747588

[32] [32]

https://arxiv.org/abs/2511.17592 [Kjæ90] U. B. Kjærulff. Triangulation of Graphs – Algorithms Giving Small Total State Space. Technical report R 90-09, Aalborg University,

arXiv

[33] [33]

Kumar, A

https://cse.unl.edu/~choueiry/Documents/Kjaerulff-TR-1990.pdf [KSNa26] U. Kumar, A. Saito, H. Niranjani, et al. Evolving Interpretable Constitutions for Multi-Agent Coordination

1990

[34] [34]

Klowden, T

https://arxiv.org/abs/2602.00755 [KT26] T. Klowden, T. Tao. Mathematical methods and human thought in the age of AI

arXiv

[35] [35]

https://arxiv.org/abs/2603.26524 [LGWa26] Z. Liu, X. Guo, X. Wei, et al. Escher-Loop: Mutual Evolution by Closed-Loop Self-Referential Optimization

arXiv

[36] [36]

https://arxiv.org/abs/2604.23472 [LIC25] R. T. Lange, Y. Imajuku, E. Cetin. ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

Pith/arXiv arXiv

[37] [37]

https://arxiv.org/abs/2509.19349 [LLLa26] C. Lu, C. Lu, R. T. Lange, et al. Towards end-to-end automation of AI research.Nature 651(8107):914–919, Mar

Pith/arXiv arXiv

[38] [38]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

See alsohttps://arxiv.org/abs/2408.06292. doi:10.1038/s41586-026-10265-5 [LMSa26] K.-A. Lie, O. Møyner, E. Svee, et al. Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s41586-026-10265-5

[39] [39]

https://arxiv.org/abs/2603.00214 [LTYa24] F. Liu, X. Tong, M. Yuan, et al. Evolution of heuristics: towards efficient automatic algorithm design using large language model. InProceedings of the 41st International Conference on Ma- chine Learning. ICML’24. JMLR.org,

arXiv

[40] [40]

HOPPE, M

https://dl.acm.org/doi/abs/10.5555/3692070.3693374 24 F. HOPPE, M. R ¨OHRIG-Z ¨OLLNER, P. KNECHTGES [LYFa26] H. Lin, H. Ye, W. Feng, et al. Can Language Models Discover Scaling Laws?

work page doi:10.5555/3692070.3693374

[41] [41]

https://arxiv.org/abs/2507.21184 [LZ] X.-Y. Liu, Z. Zhang. Classical Simulation of Quantum Circuits Using Reinforcement Learning: Parallel Environments and Benchmark. InNeurIPS

arXiv

[42] [42]

https://proceedings.neurips.cc/paper_files/paper/2023/file/ d41b70011dd21ec3de5e019302279551-Paper-Datasets_and_Benchmarks.pdf [LZCa25] G. Liu, Y. Zhu, J. Chen, et al. Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research

2023

[43] [43]

https://arxiv.org/abs/2510.06056 [LZX+24] F. Liu, R. Zhang, Z. Xie, R. Sun, K. Li, X. Lin, Z. Wang, Z. Lu, Q. Zhang. LLM4AD: A Platform for Algorithm Design with Large Language Model

arXiv

[44] [44]

https://arxiv.org/abs/2412.17287 [MMMa] E. A. Meirom, H. Maron, S. Mannor, et al. Optimizing Tensor Network Contraction Using Re- inforcement Learning. InProceedings of the 39th International Conference on Machine Learning (ICML 2022). https://proceedings.mlr.press/v162/meirom22a.html [MS08] I. L. Markov, Y. Shi. Simulating Quantum Computation by Contract...

arXiv 2022

[45] [45]

Mitchener, A

doi:10.1137/050644756 [MYCa25] L. Mitchener, A. Yiu, B. Chang, et al. Kosmos: An AI Scientist for Autonomous Discovery

work page doi:10.1137/050644756

[46] [46]

https://arxiv.org/abs/2511.02824 [MZKa26] V. A. Mazin, M. A. Zorin, D. S. Korzh, et al. LLM-Guided Prompt Evolution for Password Guessing

Pith/arXiv arXiv

[47] [47]

Nagaitsev, L

https://arxiv.org/abs/2604.12601 [NGWI25] K. Nagaitsev, L. Grbcic, S. Williams, C. Iancu. Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

Pith/arXiv arXiv

[48] [48]

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

https://arxiv.org/abs/2511.16964 [NRT] D. Neum¨ uller, A. Raschke, M. Tichy. Providing Information About Implemented Algorithms Improves Program Comprehension: A Controlled Experiment. InProceedings of the 29th Inter- national Conference on Evaluation and Assessment in Software Engineering. EASE ’25. doi:10.1145/3756681.3756968 [NVEa25] A. Novikov, N. V˜ ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3756681.3756968

[49] [49]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

https://arxiv.org/abs/2506.13131 [O’G] B. O’Gorman. Parameterization of Tensor Network Contraction. In14th Conference on the The- ory of Quantum Computation, Communication and Cryptography (TQC 2019). doi:10.4230/LIPIcs.TQC.2019.10 [Or´ u14] R. Or´ us. A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States.A...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.4230/lipics.tqc.2019.10 2019

[50] [50]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states,

doi:10.1016/j.aop.2014.06.013 [PA+25] O. Press, B. Amos et al. AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

work page doi:10.1016/j.aop.2014.06.013 2014

[51] [51]

Romera-Paredes, M

https://arxiv.org/abs/2507.15887 [RBNa24] B. Romera-Paredes, M. Barekatain, A. Novikov, et al. Mathematical discoveries from program search with large language models.Nature625:468–475,

arXiv

[52] [52]

Pawan Kumar, Emilien Dupont, Francisco J

doi:10.1038/s41586-023-06924-6 [RBSa25] P. Rajput, A. A. Bonkoungou, Y. Song, et al. Dynamic Stability of LLM-Generated Code

work page doi:10.1038/s41586-023-06924-6

[53] [53]

https://arxiv.org/abs/2511.07463 [RTL76] D. J. Rose, R. E. Tarjan, G. S. Lueker. Algorithmic Aspects of Vertex Elimination on Graphs. SIAM Journal on Computing5(2):266–283,

arXiv

[54] [54]

Staudt, M

doi:10.1137/0205021 [SBK+] C. Staudt, M. Blacher, J. Klaus, F. Lippmann, J. Giesen. Improved Cut Strategy for Tensor Net- work Contraction Orders. In22nd International Symposium on Experimental Algorithms (SEA ALGORITHMIC ALGORITHM DEVELOPMENT WITH LLMS 25 2024). doi:10.4230/LIPIcs.SEA.2024.27 [SG18] D. G. A. Smith, J. Gray. opt einsum - A Python package ...

work page doi:10.1137/0205021 2024

[55] [55]

Schlag, T

doi:10.21105/joss.00753 [SHGa23] S. Schlag, T. Heuer, L. Gottesb¨ uren, et al. High-Quality Hypergraph Partitioning.ACM J. Exp. Algorithmics27, Feb

work page doi:10.21105/joss.00753

[56] [56]

doi:10.1145/3529090 [SISa25] M. L. Siddiq, A. Islam-Gomes, N. Sekerak, et al. Large Language Models for Software Engineering: A Reproducibility Crisis

work page doi:10.1145/3529090

[57] [57]

Schindler, A

https://arxiv.org/abs/2512.00651 [SJ20] F. Schindler, A. S. Jermyn. Algorithms for tensor network contraction ordering.Machine Learn- ing: Science and Technology1(3):035001,

arXiv

[58] [58]

Stoian, R

doi:10.1088/2632-2153/ab94c5 [SMM24] M. Stoian, R. M. Milbradt, C. B. Mendl. On the Optimal Linear Contraction Order of Tree Tensor Networks, and Beyond.SIAM Journal on Scientific Computing46(5):B647–B668,

work page doi:10.1088/2632-2153/ab94c5

[59] [59]

ˇSurina, A

doi:10.1137/23M161286X [ˇSMQa25] A. ˇSurina, A. Mansouri, L. C. P. M. Quaedvlieg, et al. Algorithm Discovery With LLMs: Evolu- tionary Search Meets Reinforcement Learning

work page doi:10.1137/23m161286x

[60] [60]

Siegmund, J

https://arxiv.org/abs/2504.05108 [SS15] J. Siegmund, J. Schumann. Confounding parameters on program comprehension: a literature survey.Empirical Software Engineering20(4):1159–1192, Aug

Pith/arXiv arXiv

[61] [61]

Strasser

doi:10.1007/s10664-014-9318-8 [Str17] B. Strasser. Computing Tree Decompositions with FlowCutter: PACE 2017 Submission

work page doi:10.1007/s10664-014-9318-8 2017

[62] [62]

https://arxiv.org/abs/1709.08949 [Tam] H. Tamaki. Positive-instance driven dynamic programming for treewidth. In25th Annual Euro- pean Symposium on Algorithms (ESA 2017). doi:10.4230/LIPIcs.ESA.2017.68 [TDPa26] A. Torri, P. Dominikowski, B. Pointal, et al. Near-Optimal Contraction Strategies for the Scalar Product in the Tensor-Train Format. In Nagel et a...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.4230/lipics.esa.2017.68 2017

[63] [63]

doi:10.1007/978-3-031-99872-0 5 [TGZa24] M. Tian, L. Gao, S. D. Zhang, et al. SciCode: A Research Coding Benchmark Curated by Scientists

work page doi:10.1007/978-3-031-99872-0

[64] [64]

Thach, A

https://arxiv.org/abs/2407.13168 [TRHC25] N. Thach, A. Riahifar, N. Huynh, H. Chan. RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models

arXiv

[65] [65]

Verstraete, D

https://arxiv.org/abs/2505.20242 [VPC04] F. Verstraete, D. Porras, J. I. Cirac. Density Matrix Renormalization Group and Periodic Bound- ary Conditions: A Quantum Information Perspective.Phys. Rev. Lett.93:227205, Nov

arXiv

[66] [66]

doi:10.1103/PhysRevLett.93.227205 [WQBa26] J. Wen, L. Qiu, J. Benton, et al. Automated Weak-to-Strong Researcher

work page doi:10.1103/physrevlett.93.227205

[67] [67]

https://alignment.anthropic.com/2026/automated-w2s-researcher/ [WSZa25] Y

Anthropic Align- ment Science Blog. https://alignment.anthropic.com/2026/automated-w2s-researcher/ [WSZa25] Y. Wang, S.-R. Su, Z. Zeng, et al. ThetaEvolve: Test-time Learning on Open Problems

2026

[68] [68]

https://arxiv.org/abs/2511.23473 [XZLa23] J. Xu, H. Zhang, L. Liang, et al. NP-Hardness of Tensor Network Contraction Ordering

Pith/arXiv arXiv

[69] [69]

https://arxiv.org/abs/2310.06140 [YLDa25] J. Yuan, H. Li, X. Ding, et al. Understanding and Mitigating Numerical Sources of Nondetermin- ism in LLM Inference. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

arXiv

[70] [70]

HOPPE, M

https://arxiv.org/abs/2506.09501 26 F. HOPPE, M. R ¨OHRIG-Z ¨OLLNER, P. KNECHTGES [YLLa25] Y. Yamada, R. T. Lange, C. Lu, et al. The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

arXiv

[71] [71]

https://arxiv.org/abs/2504.08066 [YWCa24] H. Ye, J. Wang, Z. Cao, et al. ReEvo: large language models as hyper-heuristics with reflective evolution. InProceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24. Curran Associates Inc., Red Hook, NY, USA,

Pith/arXiv arXiv

[72] [72]

https://dl.acm.org/doi/10.5555/3737916.3739297 [YZLL24] J. Yang, K. Zhou, Y. Li, Z. Liu. Generalized Out-of-Distribution Detection: A Survey.Interna- tional Journal of Computer Vision132:5635–5662,

work page doi:10.5555/3737916.3739297

[73] [73]

doi:10.1007/s11263-024-02117-4 [ZLSa24] J. Zeng, C. Li, Z. Sun, et al. tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs). InProceedings of the 41st International Confer- ence on Machine Learning

work page doi:10.1007/s11263-024-02117-4

[74] [74]

EVOLVE-BLOCK-START

https://arxiv.org/abs/2602.08253 ALGORITHMIC ALGORITHM DEVELOPMENT WITH LLMS 27 System Message for OpenEvolve You are an expert programmer and expert in tensor networks. Your goal is to evolve and improve the code of the function ‘find_edge_path‘ in between the markers "EVOLVE-BLOCK-START" and "EVOLVE- BLOCK-END". CONTEXT: The function ‘find_edge_path‘ ge...

arXiv