Fine-Tuning Large Language Models for Quantum Reasoning

Casey R. Myers; James Quach; Katherine Ip; Peiyong Wang; Udaya Parampalli

arxiv: 2606.21974 · v1 · pith:T5QXJU47new · submitted 2026-06-20 · 🪐 quant-ph · cs.AI

Fine-Tuning Large Language Models for Quantum Reasoning

Katherine Ip , Casey R. Myers , Udaya Parampalli , James Quach , Peiyong Wang This is my paper

Pith reviewed 2026-06-26 11:59 UTC · model grok-4.3

classification 🪐 quant-ph cs.AI

keywords quantum circuit simulationlarge language modelssupervised fine-tuningquantum reasoningstate-vector simulationpolicy optimisationmeasurement probability

0 comments

The pith

Fine-tuning on explicit state-vector traces lets LLMs predict quantum circuit outcomes with near-perfect accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether supervised fine-tuning pipelines can move large language models from pattern matching to genuine quantum reasoning by training them to predict the full measurement probability distribution after sequences of quantum gates. The authors compare two approaches: one that supplies the model with complete gate-by-gate simulation traces and another that adds a second stage of policy optimisation with verifiable rewards. If the pipelines succeed, the models should extrapolate to unseen gate counts and, in the second case, to larger numbers of qubits that the base model cannot handle. The central finding is that the first pipeline reaches near-perfect accuracy on both in-distribution and extrapolated cases while the second trades some precision for better scaling to bigger systems.

Core claim

Training large language models on explicit gate-by-gate state-vector simulation traces produces accurate prediction of measurement probability distributions for quantum circuits. Supervised fine-tuning alone reaches near-perfect accuracy inside the training distribution and when extrapolating in gate count; adding a subsequent stage of group relative policy optimisation with verifiable rewards reduces in-distribution precision but improves performance on larger qubit systems that the supervised stage alone cannot solve. Both pipelines substantially exceed the performance of the untuned base model and an external large baseline.

What carries the argument

Two fine-tuning pipelines that supply the model with explicit step-by-step state-vector simulation traces: supervised fine-tuning on those traces, and the same supervised stage followed by group relative policy optimisation using verifiable rewards.

If this is right

LLMs can serve as accurate simulators for quantum circuits whose size exceeds what the base model can handle.
Explicit trace supervision enables extrapolation in the number of gates without retraining.
The two-stage pipeline extends capability to qubit counts unreachable by supervised fine-tuning alone.
Both methods outperform the base model and the external baseline on the quantum simulation task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trace-based supervision could be applied to other domains that require step-by-step physical simulation.
If the model truly internalises the rules, it might be prompted to propose new circuit designs rather than only evaluate given ones.
A direct next measurement would be whether the fine-tuned models retain accuracy when the target distribution includes hardware noise models not seen in training.

Load-bearing premise

That success on simulation traces reflects genuine quantum reasoning rather than statistical matching of patterns present in the training distribution.

What would settle it

A test set of circuits whose gate sequences or qubit counts lie well outside the training distribution yet require only the same linear-algebra rules; if accuracy collapses to random guessing on those circuits, the claim that the model has learned quantum reasoning fails.

Figures

Figures reproduced from arXiv: 2606.21974 by Casey R. Myers, James Quach, Katherine Ip, Peiyong Wang, Udaya Parampalli.

**Figure 2.** Figure 2: Full prompt-completion example for SFT training on a 1-qubit Non-parameterised [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Train and validation loss of SFT training for the Non-parameterised (left) and Pa [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Step-by-step quantum state fidelity of the SFT model during inference, evaluated on [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Training dynamics of the GRPO stage for the Non-parameterised and Parameterised [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Three-stage TVD progression and token-limit violation counts. (top) Mean TVD [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Mean TVD as a function of circuit complexity for the Non-Parameterised Set (left [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Example SFT+GRPO output illustrating the token-efficient shortcut behavior absent [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: System-size extrapolation on a 6-qubit circuit. Both models initialise and maintain [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Large language models (LLMs) exhibit abilities beyond natural language modelling and text generation. Recent advances in their reasoning capabilities have spurred interest in applying LLMs to complex scientific tasks requiring deep domain expertise and sophisticated reasoning. Quantum computing, as a highly specialised field with significant knowledge barriers and hardware constraints, could greatly benefit from such advancements. However, a key open question that first must be answered is: How can we develop fine-tuning pipelines that instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching? We study this question through quantum circuit simulation as a training objective, where the model must predict the measurement probability distribution resulting from a sequence of quantum gate operations. We propose and compare two fine-tuning pipelines: (1) Supervised Fine-Tuning (SFT) on explicit gate-by-gate state-vector simulation traces, and (2) a two-stage SFT+Group Relative Policy Optimisation (GRPO) approach that sequentially applies SFT followed by GRPO with verifiable rewards. Our findings show that SFT achieves near-perfect in-distribution and gate-count extrapolation accuracy, significantly outperforming both the base model and the GPT-OSS-120B baseline. SFT+GRPO trades some in-distribution precision for better generalisation to larger qubit systems that SFT alone cannot handle. Both pipelines significantly outperform the baselines, demonstrating that targeted fine-tuning on explicit reasoning traces is an effective strategy for advancing quantum reasoning in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuning on simulation traces improves LLM accuracy on quantum circuit tasks but does not test whether the model acquired genuine reasoning versus pattern matching.

read the letter

The paper takes existing supervised fine-tuning and GRPO methods and applies them to the concrete task of predicting measurement probabilities from gate sequences in quantum circuits. It reports that SFT reaches near-perfect accuracy both in-distribution and when extrapolating gate count, beating the base model and a 120B baseline, while the two-stage SFT+GRPO version gives up some in-distribution precision for better handling of larger qubit counts.

The comparison of the two pipelines and the explicit focus on qubit extrapolation are the clearest new pieces. The work is straightforward about using state-vector simulation traces as the training signal and shows measurable gains over the baselines it tests.

The main limitation is that every evaluation stays inside the same family of simulatable circuits. Nothing in the reported results checks whether the model has picked up principles that would apply outside explicit simulation traces, such as properties that cannot be classically simulated or circuits with qualitatively different structure. The abstract itself flags this as the open question, yet the experiments do not add tests that would separate reasoning from distribution matching. Without the actual numbers, dataset sizes, or controls it is also difficult to judge how stable the reported gains are.

The paper is aimed at researchers working at the AI-quantum interface who want to see whether standard fine-tuning recipes transfer to circuit-level tasks. It is worth sending to peer review so the empirical details and the scope of the generalization claims can be examined directly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two fine-tuning pipelines for LLMs on quantum circuit simulation: (1) supervised fine-tuning (SFT) on explicit gate-by-gate state-vector traces to predict measurement probability distributions, and (2) a two-stage SFT followed by Group Relative Policy Optimisation (GRPO) with verifiable rewards. It claims that SFT achieves near-perfect in-distribution and gate-count extrapolation accuracy, significantly outperforming the base model and GPT-OSS-120B baseline, while SFT+GRPO trades some in-distribution precision for improved generalisation to larger qubit systems.

Significance. If the empirical results are robust, the work would demonstrate an effective strategy for adapting LLMs to quantum tasks via simulation traces, with the verifiable-reward component of GRPO providing a reproducible training signal. This could open pathways for LLM-assisted quantum algorithm design, though the paper's framing of 'genuine quantum reasoning' versus pattern matching on simulation data is central to its contribution.

major comments (2)

[Abstract] Abstract: the central claim that the pipelines 'instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching' is not supported by the described evaluations, which are confined to accuracy on measurement-probability prediction for circuits drawn from (or extrapolated within) the same state-vector simulation distribution; no experiments probe conceptual understanding, non-simulatable properties, or circuits with qualitatively different structure.
[Results] Results section: the abstract asserts 'near-perfect' accuracy and 'significantly outperforming' baselines without supplying concrete metrics, dataset sizes, error bars, or controls for post-hoc selection; these details are required to evaluate whether the reported performance reflects acquisition of quantum principles or memorised correlations in the training traces.

minor comments (2)

[Methods] Clarify the precise circuit-generation parameters, training-set sizes, and held-out test distributions in the methods to allow reproduction of the in-distribution versus extrapolation splits.
Ensure all result tables and figures report error bars or confidence intervals and explicitly define the GPT-OSS-120B baseline configuration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the pipelines 'instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching' is not supported by the described evaluations, which are confined to accuracy on measurement-probability prediction for circuits drawn from (or extrapolated within) the same state-vector simulation distribution; no experiments probe conceptual understanding, non-simulatable properties, or circuits with qualitatively different structure.

Authors: We agree that the evaluations presented are limited to predicting measurement probabilities on circuits from the state-vector simulation distribution, including some extrapolation in gate count. The phrasing 'genuine quantum reasoning' in the abstract is an interpretive claim based on the model's success in composing quantum operations step-by-step, which goes beyond simple memorization due to the variable-length and compositional nature of the traces. However, we acknowledge that this does not constitute direct evidence of conceptual understanding or performance on non-simulatable tasks. We will revise the abstract to moderate this language, for example by stating that the pipelines enable LLMs to learn quantum circuit simulation effectively, and add a section discussing the distinction between simulation-based learning and broader quantum reasoning. revision: yes
Referee: [Results] Results section: the abstract asserts 'near-perfect' accuracy and 'significantly outperforming' baselines without supplying concrete metrics, dataset sizes, error bars, or controls for post-hoc selection; these details are required to evaluate whether the reported performance reflects acquisition of quantum principles or memorised correlations in the training traces.

Authors: The full results section provides the requested details, including specific accuracy metrics (near 100% on in-distribution tests), dataset sizes (e.g., training sets of 10,000+ traces), error bars from multiple random seeds, and baseline comparisons. The abstract, however, is high-level and does not include these numbers. We will revise the abstract to include key quantitative results, such as exact accuracy figures and performance deltas versus baselines, to make the claims more concrete and allow better assessment of whether the performance indicates learned principles. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out simulation data with no derivations reducing to inputs by construction.

full rationale

The paper reports empirical accuracy of SFT and SFT+GRPO pipelines on quantum circuit state-vector prediction tasks, using held-out test sets for in-distribution and extrapolation evaluation. No equations, uniqueness theorems, or first-principles derivations are presented that reduce reported performance metrics to fitted parameters or self-citations defined by the same training distribution. The abstract explicitly frames genuine reasoning vs. pattern matching as an open question rather than claiming resolution via any self-referential construction. All load-bearing claims rest on standard supervised learning and RL evaluation protocols applied to external simulation traces.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that simulation-trace prediction equates to reasoning.

pith-pipeline@v0.9.1-grok · 5795 in / 1002 out tokens · 20235 ms · 2026-06-26T11:59:02.937904+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 5 linked inside Pith

[1]

Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh, Nathan Lile, and Noah Goodman. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars. In Second Conference on Language Modeling , 2025

2025
[2]

Qwen2.5 technical report, 2025

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

2025
[3]

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints , pages arXiv–2407, 2024

2024
[4]

Thinking, fast and slow

Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux , 2011

2011
[5]

Gold-medalist performance in solving olympiad geometry with alphageometry2

Yuri Chervonyi, Trieu H Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang H Nguyen, Marcelo Menegali, Junehyuk Jung, Junsu Kim, Vikas Verma, Quoc V Le, et al. Gold-medalist performance in solving olympiad geometry with alphageometry2. Journal of Machine Learning Research, 26(241):1–39, 2025

2025
[6]

Cwm: An open-weights llm for research on code generation with world models

Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, et al. Cwm: An open-weights llm for research on code generation with world models. arXiv preprint arXiv:2510.02387, 2025

arXiv 2025
[7]

A systematic survey on large language models for algorithm design

Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Xi Lin, Zhe Zhao, Xialiang Tong, Kun Mao, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design. ACM Computing Surveys , 58(8):1–32, 2026

2026
[8]

Sparks of artificial general intelligence: Early experiments with gpt-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 , 2023

Pith/arXiv arXiv 2023
[9]

Benchmarking large language models for molecule prediction tasks

Zhiqiang Zhong, Kuangyu Zhou, and Davide Mottin. Benchmarking large language models for molecule prediction tasks. arXiv preprint arXiv:2403.05075 , 2024

arXiv 2024
[10]

Bayesian optimization of catalysis with in-context learning

Mayk Caldas Ramos, Shane S Michtavy, Andrew D White, and Marc D Porosoff. Bayesian optimization of catalysis with in-context learning. ACS Central Science , 12(5):599, 2026

2026
[11]

Towards end-to-end automation of ai research

Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foer- ster, David Ha, and Jeff Clune. Towards end-to-end automation of ai research. Nature, 651(8107):914–919, 2026

2026
[12]

A survey on large language models for code generation

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026. 24

2026
[13]

On repairing quantum programs us- ing chatgpt

Xiaoyu Guo, Jianjun Zhao, and Pengzhan Zhao. On repairing quantum programs us- ing chatgpt. In Proceedings of the 5th ACM/IEEE International Workshop on Quantum Software Engineering, pages 9–16, 2024

2024
[14]

Q-bridge: Code translation for quantum machine learning via llms

Runjia Zeng, Priyabrata Senapati, Ruixiang Tang, Dongfang Liu, and Qiang Guan. Q-bridge: Code translation for quantum machine learning via llms. arXiv preprint arXiv:2603.27836, 2026

arXiv 2026
[15]

Qiskit code assistant: Training llms for generating quantum computing code

Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, and Juan Cruz-Benito. Qiskit code assistant: Training llms for generating quantum computing code. In 2024 IEEE LLM Aided Design Workshop (LAD) , pages 1–4. IEEE, 2024

2024
[16]

Quantum verifiable rewards for post-training qiskit code assistant

Nicolas Dupuis, Adarsh Tiwari, Youssef Mroueh, David Kremer, Ismael Faro, and Juan Cruz-Benito. Quantum verifiable rewards for post-training qiskit code assistant. arXiv preprint arXiv:2508.20907 , 2025

arXiv 2025
[17]

Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction

Charlie Campbell, Hao Mark Chen, Wayne Luk, and Hongxiang Fan. Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction. In 2025 62nd ACM/IEEE Design Automation Conference (DAC) , pages 1–7. IEEE, 2025

2025
[18]

Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback

Taku Mikuriya, Tatsuya Ishigaki, Masayuki Kawarada, Shunya Minami, Tadashi Kad- owaki, Yohichi Suzuki, Soshun Naito, Shunya Takada, Takumi Kato, Tamotsu Basseda, et al. Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback. In Proceedings of the 18th International Natural Language Gen- eration Conference, pag...

2025
[19]

Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset

Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset. arXiv preprint arXiv:2503.02497 , 2025

Pith/arXiv arXiv 2025
[20]

Pennycoder: Eﬀicient domain-specific llms for pennylane-based quantum code generation

Abdul Basit, Minghao Shao, Muhammad Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennycoder: Eﬀicient domain-specific llms for pennylane-based quantum code generation. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 2, pages 229–234. IEEE, 2025

2025
[21]

Qagent: An llm-based multi-agent system for autonomous openqasm programming

Zhenxiao Fu, Fan Chen, and Lei Jiang. Qagent: An llm-based multi-agent system for autonomous openqasm programming. arXiv preprint arXiv:2508.20134 , 2025

arXiv 2025
[22]

Unleashing the potential of llms for quantum computing: A study in quantum architecture design

Zhiding Liang, Jinglei Cheng, Rui Yang, Hang Ren, Zhixin Song, Di Wu, Xuehai Qian, Tongyang Li, and Yiyu Shi. Unleashing the potential of llms for quantum computing: A study in quantum architecture design. arXiv preprint arXiv:2307.08191 , 2023

arXiv 2023
[23]

Agent-q: fine-tuning large language models for quantum circuit generation and optimization

Linus Jern, Valter Uotila, Cong Yu, and Bo Zhao. Agent-q: fine-tuning large language models for quantum circuit generation and optimization. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 1, pages 1621–1632. IEEE, 2025

2025
[24]

Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl

Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, and Bo Zhao. Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl. arXiv preprint arXiv:2510.00967 , 2025

arXiv 2025
[25]

Automated near-term quantum algorithm discovery for molecular ground states

Fabian Finger, Frederic Rapp, Pranav Kalidindi, Kerry He, Kante Yin, Alexander Koziell- Pipe, David Zsolt Manrique, Gabriel Greene-Diniz, Stephen Clark, Hamza Fawzi, et al. Automated near-term quantum algorithm discovery for molecular ground states. arXiv preprint arXiv:2603.26359 , 2026. 25

arXiv 2026
[26]

Scalable quantum state preparation via large- language-model-driven discovery

Qing-Hong Cao, Zong-Yue Hou, Ying-Ying Li, Xiaohui Liu, Zhuo-Yang Song, Liang-Qi Zhang, Shutao Zhang, and Ke Zhao. Scalable quantum state preparation via large- language-model-driven discovery. arXiv preprint arXiv:2505.06347 , 2025

arXiv 2025
[27]

Optimizing ansatz design in quantum generative adver- sarial networks using large language models

Kento Ueda and Atsushi Matsuo. Optimizing ansatz design in quantum generative adver- sarial networks using large language models. arXiv preprint arXiv:2503.12884 , 2025

arXiv 2025
[28]

Automating quantum feature map design via large language models

Kenya Sakka, Kosuke Mitarai, and Keisuke Fujii. Automating quantum feature map design via large language models. arXiv preprint arXiv:2504.07396 , 2025

arXiv 2025
[29]

Grovergpt: A large language model with 8 billion parameters for quantum searching

Haoran Wang, Pingzhi Li, Min Chen, Jinglei Cheng, Junyu Liu, and Tianlong Chen. Grovergpt: A large language model with 8 billion parameters for quantum searching. arXiv preprint arXiv:2501.00135 , 2024

arXiv 2024
[30]

Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization

Min Chen, Jinglei Cheng, Pingzhi Li, Haoran Wang, Tianlong Chen, and Junyu Liu. Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization. npj Quantum Information , 2026

2026
[31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017

2017
[32]

Improving lan- guage understanding by generative pre-training

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving lan- guage understanding by generative pre-training. 2018

2018
[33]

AlphaZero-like tree-search can guide large language model decoding and training

Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, and Jun Wang. AlphaZero-like tree-search can guide large language model decoding and training. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st Interna- tional Conf...

2024
[34]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645(8081):633–638, 2025

2025
[35]

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv e-prints , pages arXiv–2402, 2024

2024
[36]

Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James Validad Miranda, Alisa Liu, Nouha Dziri, Xinxi Lyu, Yul- ing Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannan...

2025
[37]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

2022
[38]

How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025

Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R Butt, Mohammad Taha Khan, and Muhammad Ali Gulzar. How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025

arXiv 2025
[39]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states

Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics , 349:117–158, 2014. 26

2014
[40]

Solving the quantum many-body problem with artificial neural networks

Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017

2017
[41]

Application of large language models to quantum state simulation

Shuangxiang Zhou, Ronghang Chen, Zheng An, Chao Zhang, and Shi-Yao Hou. Application of large language models to quantum state simulation. Science China Physics, Mechanics & Astronomy, 68(4):240313, 2025

2025
[42]

A fast quantum mechanical algorithm for database search

Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , pages 212–219, 1996

1996
[43]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences , 114(13):3521–3526, 2017

2017
[44]

Stabilizing transformer training by preventing attention entropy collapse

Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M Susskind. Stabilizing transformer training by preventing attention entropy collapse. In International Conference on Machine Learning , pages 40770–40803. PMLR, 2023

2023
[45]

Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset

Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, and Igor Gitman. Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset. arXiv preprint arXiv:2504.16891, 2025

arXiv 2025
[46]

Deepseek-coder: When the large language model meets programming–the rise of code intelligence

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196 , 2024

Pith/arXiv arXiv 2024
[47]

Opencodereasoning: Advancing data distillation for competitive coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Sid- dhartha Jain, Jocelyn Huang, Vahid Noroozi, and Boris Ginsburg. Opencodereasoning: Advancing data distillation for competitive coding. In Second Conference on Language Modeling, 2025

2025
[48]

Hybridflow: A flexible and eﬀicient rlhf framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and eﬀicient rlhf framework. In Proceedings of the Twentieth European Conference on Computer Systems, pages 1279–1297, 2025

2025
[49]

Qwen3 technical report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

Pith/arXiv arXiv 2025
[50]

gpt-oss-120b & gpt-oss-20b model card

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925 , 2025

Pith/arXiv arXiv 2025
[51]

Exploring length generalization in large language models

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ra- masesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022

2022
[52]

Length generalization in arithmetic transformers

Samy Jelassi, Stéphane d’Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, and François Charton. Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400, 2023. 27

arXiv 2023
[53]

How are you?

Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. In International Conference on Learning Representations , volume 2024, pages 15898–15926, 2024. 28 A SFT Training Configuration Configuration V alue Base model Qwen...

2024
[54]

RZ(−3π/4) gate on qubit 2

circuit.rz(-3*pi/4, 2) . RZ(−3π/4) gate on qubit 2. <quantum_state> [0.38 + 0.92i, 0, 0, 0, 0, 0, 0, 0] </quantum_state>
[55]

RY(−π/4) gate on qubit 1

circuit.ry(-pi/4, 1) . RY(−π/4) gate on qubit 1. <quantum_state> [0.35 + 0.85i, 0, −0.15 − 0.35i, 0, 0, 0, 0, 0] </quantum_state>
[56]

RY(3π/4) gate on qubit 0

circuit.ry(3*pi/4, 0) . RY(3π/4) gate on qubit 0. <quantum_state> [0.14 + 0 .33i, 0, −0.06 − 0.14i, 0, 0.33 + 0 .79i, 0, −0.14 − 0.33i, 0] </quantum_state> C GRPO Training Configuration Configuration V alue LoRA rank 16 LoRA alpha 32 LoRA Target modules All linear layers Training batch size 64 Rollouts per sample 5 Rollouts Temperature 1.0 Rollouts Top-p ...
[57]

State becomes: 1/√2 (|000 + |010)

h(1) → apply H to qubit 1. State becomes: 1/√2 (|000 + |010)
[58]

Now the state is 1/2 (|000 + |010 + |100 + |110) [...] 30

h(0) → apply H to qubit 0. Now the state is 1/2 (|000 + |010 + |100 + |110) [...] 30

[1] [1]

Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh, Nathan Lile, and Noah Goodman. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars. In Second Conference on Language Modeling , 2025

2025

[2] [2]

Qwen2.5 technical report, 2025

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

2025

[3] [3]

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints , pages arXiv–2407, 2024

2024

[4] [4]

Thinking, fast and slow

Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux , 2011

2011

[5] [5]

Gold-medalist performance in solving olympiad geometry with alphageometry2

Yuri Chervonyi, Trieu H Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang H Nguyen, Marcelo Menegali, Junehyuk Jung, Junsu Kim, Vikas Verma, Quoc V Le, et al. Gold-medalist performance in solving olympiad geometry with alphageometry2. Journal of Machine Learning Research, 26(241):1–39, 2025

2025

[6] [6]

Cwm: An open-weights llm for research on code generation with world models

Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, et al. Cwm: An open-weights llm for research on code generation with world models. arXiv preprint arXiv:2510.02387, 2025

arXiv 2025

[7] [7]

A systematic survey on large language models for algorithm design

Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Xi Lin, Zhe Zhao, Xialiang Tong, Kun Mao, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design. ACM Computing Surveys , 58(8):1–32, 2026

2026

[8] [8]

Sparks of artificial general intelligence: Early experiments with gpt-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 , 2023

Pith/arXiv arXiv 2023

[9] [9]

Benchmarking large language models for molecule prediction tasks

Zhiqiang Zhong, Kuangyu Zhou, and Davide Mottin. Benchmarking large language models for molecule prediction tasks. arXiv preprint arXiv:2403.05075 , 2024

arXiv 2024

[10] [10]

Bayesian optimization of catalysis with in-context learning

Mayk Caldas Ramos, Shane S Michtavy, Andrew D White, and Marc D Porosoff. Bayesian optimization of catalysis with in-context learning. ACS Central Science , 12(5):599, 2026

2026

[11] [11]

Towards end-to-end automation of ai research

Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foer- ster, David Ha, and Jeff Clune. Towards end-to-end automation of ai research. Nature, 651(8107):914–919, 2026

2026

[12] [12]

A survey on large language models for code generation

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026. 24

2026

[13] [13]

On repairing quantum programs us- ing chatgpt

Xiaoyu Guo, Jianjun Zhao, and Pengzhan Zhao. On repairing quantum programs us- ing chatgpt. In Proceedings of the 5th ACM/IEEE International Workshop on Quantum Software Engineering, pages 9–16, 2024

2024

[14] [14]

Q-bridge: Code translation for quantum machine learning via llms

Runjia Zeng, Priyabrata Senapati, Ruixiang Tang, Dongfang Liu, and Qiang Guan. Q-bridge: Code translation for quantum machine learning via llms. arXiv preprint arXiv:2603.27836, 2026

arXiv 2026

[15] [15]

Qiskit code assistant: Training llms for generating quantum computing code

Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, and Juan Cruz-Benito. Qiskit code assistant: Training llms for generating quantum computing code. In 2024 IEEE LLM Aided Design Workshop (LAD) , pages 1–4. IEEE, 2024

2024

[16] [16]

Quantum verifiable rewards for post-training qiskit code assistant

Nicolas Dupuis, Adarsh Tiwari, Youssef Mroueh, David Kremer, Ismael Faro, and Juan Cruz-Benito. Quantum verifiable rewards for post-training qiskit code assistant. arXiv preprint arXiv:2508.20907 , 2025

arXiv 2025

[17] [17]

Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction

Charlie Campbell, Hao Mark Chen, Wayne Luk, and Hongxiang Fan. Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction. In 2025 62nd ACM/IEEE Design Automation Conference (DAC) , pages 1–7. IEEE, 2025

2025

[18] [18]

Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback

Taku Mikuriya, Tatsuya Ishigaki, Masayuki Kawarada, Shunya Minami, Tadashi Kad- owaki, Yohichi Suzuki, Soshun Naito, Shunya Takada, Takumi Kato, Tamotsu Basseda, et al. Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback. In Proceedings of the 18th International Natural Language Gen- eration Conference, pag...

2025

[19] [19]

Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset

Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset. arXiv preprint arXiv:2503.02497 , 2025

Pith/arXiv arXiv 2025

[20] [20]

Pennycoder: Eﬀicient domain-specific llms for pennylane-based quantum code generation

Abdul Basit, Minghao Shao, Muhammad Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennycoder: Eﬀicient domain-specific llms for pennylane-based quantum code generation. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 2, pages 229–234. IEEE, 2025

2025

[21] [21]

Qagent: An llm-based multi-agent system for autonomous openqasm programming

Zhenxiao Fu, Fan Chen, and Lei Jiang. Qagent: An llm-based multi-agent system for autonomous openqasm programming. arXiv preprint arXiv:2508.20134 , 2025

arXiv 2025

[22] [22]

Unleashing the potential of llms for quantum computing: A study in quantum architecture design

Zhiding Liang, Jinglei Cheng, Rui Yang, Hang Ren, Zhixin Song, Di Wu, Xuehai Qian, Tongyang Li, and Yiyu Shi. Unleashing the potential of llms for quantum computing: A study in quantum architecture design. arXiv preprint arXiv:2307.08191 , 2023

arXiv 2023

[23] [23]

Agent-q: fine-tuning large language models for quantum circuit generation and optimization

Linus Jern, Valter Uotila, Cong Yu, and Bo Zhao. Agent-q: fine-tuning large language models for quantum circuit generation and optimization. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 1, pages 1621–1632. IEEE, 2025

2025

[24] [24]

Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl

Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, and Bo Zhao. Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl. arXiv preprint arXiv:2510.00967 , 2025

arXiv 2025

[25] [25]

Automated near-term quantum algorithm discovery for molecular ground states

Fabian Finger, Frederic Rapp, Pranav Kalidindi, Kerry He, Kante Yin, Alexander Koziell- Pipe, David Zsolt Manrique, Gabriel Greene-Diniz, Stephen Clark, Hamza Fawzi, et al. Automated near-term quantum algorithm discovery for molecular ground states. arXiv preprint arXiv:2603.26359 , 2026. 25

arXiv 2026

[26] [26]

Scalable quantum state preparation via large- language-model-driven discovery

Qing-Hong Cao, Zong-Yue Hou, Ying-Ying Li, Xiaohui Liu, Zhuo-Yang Song, Liang-Qi Zhang, Shutao Zhang, and Ke Zhao. Scalable quantum state preparation via large- language-model-driven discovery. arXiv preprint arXiv:2505.06347 , 2025

arXiv 2025

[27] [27]

Optimizing ansatz design in quantum generative adver- sarial networks using large language models

Kento Ueda and Atsushi Matsuo. Optimizing ansatz design in quantum generative adver- sarial networks using large language models. arXiv preprint arXiv:2503.12884 , 2025

arXiv 2025

[28] [28]

Automating quantum feature map design via large language models

Kenya Sakka, Kosuke Mitarai, and Keisuke Fujii. Automating quantum feature map design via large language models. arXiv preprint arXiv:2504.07396 , 2025

arXiv 2025

[29] [29]

Grovergpt: A large language model with 8 billion parameters for quantum searching

Haoran Wang, Pingzhi Li, Min Chen, Jinglei Cheng, Junyu Liu, and Tianlong Chen. Grovergpt: A large language model with 8 billion parameters for quantum searching. arXiv preprint arXiv:2501.00135 , 2024

arXiv 2024

[30] [30]

Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization

Min Chen, Jinglei Cheng, Pingzhi Li, Haoran Wang, Tianlong Chen, and Junyu Liu. Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization. npj Quantum Information , 2026

2026

[31] [31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017

2017

[32] [32]

Improving lan- guage understanding by generative pre-training

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving lan- guage understanding by generative pre-training. 2018

2018

[33] [33]

AlphaZero-like tree-search can guide large language model decoding and training

Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, and Jun Wang. AlphaZero-like tree-search can guide large language model decoding and training. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st Interna- tional Conf...

2024

[34] [34]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645(8081):633–638, 2025

2025

[35] [35]

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv e-prints , pages arXiv–2402, 2024

2024

[36] [36]

Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James Validad Miranda, Alisa Liu, Nouha Dziri, Xinxi Lyu, Yul- ing Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannan...

2025

[37] [37]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

2022

[38] [38]

How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025

Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R Butt, Mohammad Taha Khan, and Muhammad Ali Gulzar. How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025

arXiv 2025

[39] [39]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states

Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics , 349:117–158, 2014. 26

2014

[40] [40]

Solving the quantum many-body problem with artificial neural networks

Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017

2017

[41] [41]

Application of large language models to quantum state simulation

Shuangxiang Zhou, Ronghang Chen, Zheng An, Chao Zhang, and Shi-Yao Hou. Application of large language models to quantum state simulation. Science China Physics, Mechanics & Astronomy, 68(4):240313, 2025

2025

[42] [42]

A fast quantum mechanical algorithm for database search

Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , pages 212–219, 1996

1996

[43] [43]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences , 114(13):3521–3526, 2017

2017

[44] [44]

Stabilizing transformer training by preventing attention entropy collapse

Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M Susskind. Stabilizing transformer training by preventing attention entropy collapse. In International Conference on Machine Learning , pages 40770–40803. PMLR, 2023

2023

[45] [45]

Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset

Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, and Igor Gitman. Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset. arXiv preprint arXiv:2504.16891, 2025

arXiv 2025

[46] [46]

Deepseek-coder: When the large language model meets programming–the rise of code intelligence

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196 , 2024

Pith/arXiv arXiv 2024

[47] [47]

Opencodereasoning: Advancing data distillation for competitive coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Sid- dhartha Jain, Jocelyn Huang, Vahid Noroozi, and Boris Ginsburg. Opencodereasoning: Advancing data distillation for competitive coding. In Second Conference on Language Modeling, 2025

2025

[48] [48]

Hybridflow: A flexible and eﬀicient rlhf framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and eﬀicient rlhf framework. In Proceedings of the Twentieth European Conference on Computer Systems, pages 1279–1297, 2025

2025

[49] [49]

Qwen3 technical report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

Pith/arXiv arXiv 2025

[50] [50]

gpt-oss-120b & gpt-oss-20b model card

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925 , 2025

Pith/arXiv arXiv 2025

[51] [51]

Exploring length generalization in large language models

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ra- masesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022

2022

[52] [52]

Length generalization in arithmetic transformers

Samy Jelassi, Stéphane d’Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, and François Charton. Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400, 2023. 27

arXiv 2023

[53] [53]

How are you?

Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. In International Conference on Learning Representations , volume 2024, pages 15898–15926, 2024. 28 A SFT Training Configuration Configuration V alue Base model Qwen...

2024

[54] [54]

RZ(−3π/4) gate on qubit 2

circuit.rz(-3*pi/4, 2) . RZ(−3π/4) gate on qubit 2. <quantum_state> [0.38 + 0.92i, 0, 0, 0, 0, 0, 0, 0] </quantum_state>

[55] [55]

RY(−π/4) gate on qubit 1

circuit.ry(-pi/4, 1) . RY(−π/4) gate on qubit 1. <quantum_state> [0.35 + 0.85i, 0, −0.15 − 0.35i, 0, 0, 0, 0, 0] </quantum_state>

[56] [56]

RY(3π/4) gate on qubit 0

circuit.ry(3*pi/4, 0) . RY(3π/4) gate on qubit 0. <quantum_state> [0.14 + 0 .33i, 0, −0.06 − 0.14i, 0, 0.33 + 0 .79i, 0, −0.14 − 0.33i, 0] </quantum_state> C GRPO Training Configuration Configuration V alue LoRA rank 16 LoRA alpha 32 LoRA Target modules All linear layers Training batch size 64 Rollouts per sample 5 Rollouts Temperature 1.0 Rollouts Top-p ...

[57] [57]

State becomes: 1/√2 (|000 + |010)

h(1) → apply H to qubit 1. State becomes: 1/√2 (|000 + |010)

[58] [58]

Now the state is 1/2 (|000 + |010 + |100 + |110) [...] 30

h(0) → apply H to qubit 0. Now the state is 1/2 (|000 + |010 + |100 + |110) [...] 30