pith. sign in

arxiv: 2606.21974 · v1 · pith:T5QXJU47new · submitted 2026-06-20 · 🪐 quant-ph · cs.AI

Fine-Tuning Large Language Models for Quantum Reasoning

Pith reviewed 2026-06-26 11:59 UTC · model grok-4.3

classification 🪐 quant-ph cs.AI
keywords quantum circuit simulationlarge language modelssupervised fine-tuningquantum reasoningstate-vector simulationpolicy optimisationmeasurement probability
0
0 comments X

The pith

Fine-tuning on explicit state-vector traces lets LLMs predict quantum circuit outcomes with near-perfect accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether supervised fine-tuning pipelines can move large language models from pattern matching to genuine quantum reasoning by training them to predict the full measurement probability distribution after sequences of quantum gates. The authors compare two approaches: one that supplies the model with complete gate-by-gate simulation traces and another that adds a second stage of policy optimisation with verifiable rewards. If the pipelines succeed, the models should extrapolate to unseen gate counts and, in the second case, to larger numbers of qubits that the base model cannot handle. The central finding is that the first pipeline reaches near-perfect accuracy on both in-distribution and extrapolated cases while the second trades some precision for better scaling to bigger systems.

Core claim

Training large language models on explicit gate-by-gate state-vector simulation traces produces accurate prediction of measurement probability distributions for quantum circuits. Supervised fine-tuning alone reaches near-perfect accuracy inside the training distribution and when extrapolating in gate count; adding a subsequent stage of group relative policy optimisation with verifiable rewards reduces in-distribution precision but improves performance on larger qubit systems that the supervised stage alone cannot solve. Both pipelines substantially exceed the performance of the untuned base model and an external large baseline.

What carries the argument

Two fine-tuning pipelines that supply the model with explicit step-by-step state-vector simulation traces: supervised fine-tuning on those traces, and the same supervised stage followed by group relative policy optimisation using verifiable rewards.

If this is right

  • LLMs can serve as accurate simulators for quantum circuits whose size exceeds what the base model can handle.
  • Explicit trace supervision enables extrapolation in the number of gates without retraining.
  • The two-stage pipeline extends capability to qubit counts unreachable by supervised fine-tuning alone.
  • Both methods outperform the base model and the external baseline on the quantum simulation task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trace-based supervision could be applied to other domains that require step-by-step physical simulation.
  • If the model truly internalises the rules, it might be prompted to propose new circuit designs rather than only evaluate given ones.
  • A direct next measurement would be whether the fine-tuned models retain accuracy when the target distribution includes hardware noise models not seen in training.

Load-bearing premise

That success on simulation traces reflects genuine quantum reasoning rather than statistical matching of patterns present in the training distribution.

What would settle it

A test set of circuits whose gate sequences or qubit counts lie well outside the training distribution yet require only the same linear-algebra rules; if accuracy collapses to random guessing on those circuits, the claim that the model has learned quantum reasoning fails.

Figures

Figures reproduced from arXiv: 2606.21974 by Casey R. Myers, James Quach, Katherine Ip, Peiyong Wang, Udaya Parampalli.

Figure 1
Figure 1. Figure 1: State-vector reasoning template for quantum circuit simulation. [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Full prompt-completion example for SFT training on a 1-qubit Non-parameterised [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Train and validation loss of SFT training for the Non-parameterised (left) and Pa [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Step-by-step quantum state fidelity of the SFT model during inference, evaluated on [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training dynamics of the GRPO stage for the Non-parameterised and Parameterised [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Three-stage TVD progression and token-limit violation counts. (top) Mean TVD [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mean TVD as a function of circuit complexity for the Non-Parameterised Set (left [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example SFT+GRPO output illustrating the token-efficient shortcut behavior absent [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: System-size extrapolation on a 6-qubit circuit. Both models initialise and maintain [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
read the original abstract

Large language models (LLMs) exhibit abilities beyond natural language modelling and text generation. Recent advances in their reasoning capabilities have spurred interest in applying LLMs to complex scientific tasks requiring deep domain expertise and sophisticated reasoning. Quantum computing, as a highly specialised field with significant knowledge barriers and hardware constraints, could greatly benefit from such advancements. However, a key open question that first must be answered is: How can we develop fine-tuning pipelines that instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching? We study this question through quantum circuit simulation as a training objective, where the model must predict the measurement probability distribution resulting from a sequence of quantum gate operations. We propose and compare two fine-tuning pipelines: (1) Supervised Fine-Tuning (SFT) on explicit gate-by-gate state-vector simulation traces, and (2) a two-stage SFT+Group Relative Policy Optimisation (GRPO) approach that sequentially applies SFT followed by GRPO with verifiable rewards. Our findings show that SFT achieves near-perfect in-distribution and gate-count extrapolation accuracy, significantly outperforming both the base model and the GPT-OSS-120B baseline. SFT+GRPO trades some in-distribution precision for better generalisation to larger qubit systems that SFT alone cannot handle. Both pipelines significantly outperform the baselines, demonstrating that targeted fine-tuning on explicit reasoning traces is an effective strategy for advancing quantum reasoning in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two fine-tuning pipelines for LLMs on quantum circuit simulation: (1) supervised fine-tuning (SFT) on explicit gate-by-gate state-vector traces to predict measurement probability distributions, and (2) a two-stage SFT followed by Group Relative Policy Optimisation (GRPO) with verifiable rewards. It claims that SFT achieves near-perfect in-distribution and gate-count extrapolation accuracy, significantly outperforming the base model and GPT-OSS-120B baseline, while SFT+GRPO trades some in-distribution precision for improved generalisation to larger qubit systems.

Significance. If the empirical results are robust, the work would demonstrate an effective strategy for adapting LLMs to quantum tasks via simulation traces, with the verifiable-reward component of GRPO providing a reproducible training signal. This could open pathways for LLM-assisted quantum algorithm design, though the paper's framing of 'genuine quantum reasoning' versus pattern matching on simulation data is central to its contribution.

major comments (2)
  1. [Abstract] Abstract: the central claim that the pipelines 'instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching' is not supported by the described evaluations, which are confined to accuracy on measurement-probability prediction for circuits drawn from (or extrapolated within) the same state-vector simulation distribution; no experiments probe conceptual understanding, non-simulatable properties, or circuits with qualitatively different structure.
  2. [Results] Results section: the abstract asserts 'near-perfect' accuracy and 'significantly outperforming' baselines without supplying concrete metrics, dataset sizes, error bars, or controls for post-hoc selection; these details are required to evaluate whether the reported performance reflects acquisition of quantum principles or memorised correlations in the training traces.
minor comments (2)
  1. [Methods] Clarify the precise circuit-generation parameters, training-set sizes, and held-out test distributions in the methods to allow reproduction of the in-distribution versus extrapolation splits.
  2. Ensure all result tables and figures report error bars or confidence intervals and explicitly define the GPT-OSS-120B baseline configuration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the pipelines 'instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching' is not supported by the described evaluations, which are confined to accuracy on measurement-probability prediction for circuits drawn from (or extrapolated within) the same state-vector simulation distribution; no experiments probe conceptual understanding, non-simulatable properties, or circuits with qualitatively different structure.

    Authors: We agree that the evaluations presented are limited to predicting measurement probabilities on circuits from the state-vector simulation distribution, including some extrapolation in gate count. The phrasing 'genuine quantum reasoning' in the abstract is an interpretive claim based on the model's success in composing quantum operations step-by-step, which goes beyond simple memorization due to the variable-length and compositional nature of the traces. However, we acknowledge that this does not constitute direct evidence of conceptual understanding or performance on non-simulatable tasks. We will revise the abstract to moderate this language, for example by stating that the pipelines enable LLMs to learn quantum circuit simulation effectively, and add a section discussing the distinction between simulation-based learning and broader quantum reasoning. revision: yes

  2. Referee: [Results] Results section: the abstract asserts 'near-perfect' accuracy and 'significantly outperforming' baselines without supplying concrete metrics, dataset sizes, error bars, or controls for post-hoc selection; these details are required to evaluate whether the reported performance reflects acquisition of quantum principles or memorised correlations in the training traces.

    Authors: The full results section provides the requested details, including specific accuracy metrics (near 100% on in-distribution tests), dataset sizes (e.g., training sets of 10,000+ traces), error bars from multiple random seeds, and baseline comparisons. The abstract, however, is high-level and does not include these numbers. We will revise the abstract to include key quantitative results, such as exact accuracy figures and performance deltas versus baselines, to make the claims more concrete and allow better assessment of whether the performance indicates learned principles. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out simulation data with no derivations reducing to inputs by construction.

full rationale

The paper reports empirical accuracy of SFT and SFT+GRPO pipelines on quantum circuit state-vector prediction tasks, using held-out test sets for in-distribution and extrapolation evaluation. No equations, uniqueness theorems, or first-principles derivations are presented that reduce reported performance metrics to fitted parameters or self-citations defined by the same training distribution. The abstract explicitly frames genuine reasoning vs. pattern matching as an open question rather than claiming resolution via any self-referential construction. All load-bearing claims rest on standard supervised learning and RL evaluation protocols applied to external simulation traces.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that simulation-trace prediction equates to reasoning.

pith-pipeline@v0.9.1-grok · 5795 in / 1002 out tokens · 20235 ms · 2026-06-26T11:59:02.937904+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 5 linked inside Pith

  1. [1]

    Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars

    Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh, Nathan Lile, and Noah Goodman. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars. In Second Conference on Language Modeling , 2025

  2. [2]

    Qwen2.5 technical report, 2025

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  3. [3]

    The llama 3 herd of models

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints , pages arXiv–2407, 2024

  4. [4]

    Thinking, fast and slow

    Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux , 2011

  5. [5]

    Gold-medalist performance in solving olympiad geometry with alphageometry2

    Yuri Chervonyi, Trieu H Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang H Nguyen, Marcelo Menegali, Junehyuk Jung, Junsu Kim, Vikas Verma, Quoc V Le, et al. Gold-medalist performance in solving olympiad geometry with alphageometry2. Journal of Machine Learning Research, 26(241):1–39, 2025

  6. [6]

    Cwm: An open-weights llm for research on code generation with world models

    Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, et al. Cwm: An open-weights llm for research on code generation with world models. arXiv preprint arXiv:2510.02387, 2025

  7. [7]

    A systematic survey on large language models for algorithm design

    Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Xi Lin, Zhe Zhao, Xialiang Tong, Kun Mao, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design. ACM Computing Surveys , 58(8):1–32, 2026

  8. [8]

    Sparks of artificial general intelligence: Early experiments with gpt-4

    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 , 2023

  9. [9]

    Benchmarking large language models for molecule prediction tasks

    Zhiqiang Zhong, Kuangyu Zhou, and Davide Mottin. Benchmarking large language models for molecule prediction tasks. arXiv preprint arXiv:2403.05075 , 2024

  10. [10]

    Bayesian optimization of catalysis with in-context learning

    Mayk Caldas Ramos, Shane S Michtavy, Andrew D White, and Marc D Porosoff. Bayesian optimization of catalysis with in-context learning. ACS Central Science , 12(5):599, 2026

  11. [11]

    Towards end-to-end automation of ai research

    Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foer- ster, David Ha, and Jeff Clune. Towards end-to-end automation of ai research. Nature, 651(8107):914–919, 2026

  12. [12]

    A survey on large language models for code generation

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026. 24

  13. [13]

    On repairing quantum programs us- ing chatgpt

    Xiaoyu Guo, Jianjun Zhao, and Pengzhan Zhao. On repairing quantum programs us- ing chatgpt. In Proceedings of the 5th ACM/IEEE International Workshop on Quantum Software Engineering, pages 9–16, 2024

  14. [14]

    Q-bridge: Code translation for quantum machine learning via llms

    Runjia Zeng, Priyabrata Senapati, Ruixiang Tang, Dongfang Liu, and Qiang Guan. Q-bridge: Code translation for quantum machine learning via llms. arXiv preprint arXiv:2603.27836, 2026

  15. [15]

    Qiskit code assistant: Training llms for generating quantum computing code

    Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, and Juan Cruz-Benito. Qiskit code assistant: Training llms for generating quantum computing code. In 2024 IEEE LLM Aided Design Workshop (LAD) , pages 1–4. IEEE, 2024

  16. [16]

    Quantum verifiable rewards for post-training qiskit code assistant

    Nicolas Dupuis, Adarsh Tiwari, Youssef Mroueh, David Kremer, Ismael Faro, and Juan Cruz-Benito. Quantum verifiable rewards for post-training qiskit code assistant. arXiv preprint arXiv:2508.20907 , 2025

  17. [17]

    Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction

    Charlie Campbell, Hao Mark Chen, Wayne Luk, and Hongxiang Fan. Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction. In 2025 62nd ACM/IEEE Design Automation Conference (DAC) , pages 1–7. IEEE, 2025

  18. [18]

    Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback

    Taku Mikuriya, Tatsuya Ishigaki, Masayuki Kawarada, Shunya Minami, Tadashi Kad- owaki, Yohichi Suzuki, Soshun Naito, Shunya Takada, Takumi Kato, Tamotsu Basseda, et al. Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback. In Proceedings of the 18th International Natural Language Gen- eration Conference, pag...

  19. [19]

    Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset

    Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset. arXiv preprint arXiv:2503.02497 , 2025

  20. [20]

    Pennycoder: Efficient domain-specific llms for pennylane-based quantum code generation

    Abdul Basit, Minghao Shao, Muhammad Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennycoder: Efficient domain-specific llms for pennylane-based quantum code generation. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 2, pages 229–234. IEEE, 2025

  21. [21]

    Qagent: An llm-based multi-agent system for autonomous openqasm programming

    Zhenxiao Fu, Fan Chen, and Lei Jiang. Qagent: An llm-based multi-agent system for autonomous openqasm programming. arXiv preprint arXiv:2508.20134 , 2025

  22. [22]

    Unleashing the potential of llms for quantum computing: A study in quantum architecture design

    Zhiding Liang, Jinglei Cheng, Rui Yang, Hang Ren, Zhixin Song, Di Wu, Xuehai Qian, Tongyang Li, and Yiyu Shi. Unleashing the potential of llms for quantum computing: A study in quantum architecture design. arXiv preprint arXiv:2307.08191 , 2023

  23. [23]

    Agent-q: fine-tuning large language models for quantum circuit generation and optimization

    Linus Jern, Valter Uotila, Cong Yu, and Bo Zhao. Agent-q: fine-tuning large language models for quantum circuit generation and optimization. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 1, pages 1621–1632. IEEE, 2025

  24. [24]

    Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl

    Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, and Bo Zhao. Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl. arXiv preprint arXiv:2510.00967 , 2025

  25. [25]

    Automated near-term quantum algorithm discovery for molecular ground states

    Fabian Finger, Frederic Rapp, Pranav Kalidindi, Kerry He, Kante Yin, Alexander Koziell- Pipe, David Zsolt Manrique, Gabriel Greene-Diniz, Stephen Clark, Hamza Fawzi, et al. Automated near-term quantum algorithm discovery for molecular ground states. arXiv preprint arXiv:2603.26359 , 2026. 25

  26. [26]

    Scalable quantum state preparation via large- language-model-driven discovery

    Qing-Hong Cao, Zong-Yue Hou, Ying-Ying Li, Xiaohui Liu, Zhuo-Yang Song, Liang-Qi Zhang, Shutao Zhang, and Ke Zhao. Scalable quantum state preparation via large- language-model-driven discovery. arXiv preprint arXiv:2505.06347 , 2025

  27. [27]

    Optimizing ansatz design in quantum generative adver- sarial networks using large language models

    Kento Ueda and Atsushi Matsuo. Optimizing ansatz design in quantum generative adver- sarial networks using large language models. arXiv preprint arXiv:2503.12884 , 2025

  28. [28]

    Automating quantum feature map design via large language models

    Kenya Sakka, Kosuke Mitarai, and Keisuke Fujii. Automating quantum feature map design via large language models. arXiv preprint arXiv:2504.07396 , 2025

  29. [29]

    Grovergpt: A large language model with 8 billion parameters for quantum searching

    Haoran Wang, Pingzhi Li, Min Chen, Jinglei Cheng, Junyu Liu, and Tianlong Chen. Grovergpt: A large language model with 8 billion parameters for quantum searching. arXiv preprint arXiv:2501.00135 , 2024

  30. [30]

    Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization

    Min Chen, Jinglei Cheng, Pingzhi Li, Haoran Wang, Tianlong Chen, and Junyu Liu. Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization. npj Quantum Information , 2026

  31. [31]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017

  32. [32]

    Improving lan- guage understanding by generative pre-training

    Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving lan- guage understanding by generative pre-training. 2018

  33. [33]

    AlphaZero-like tree-search can guide large language model decoding and training

    Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, and Jun Wang. AlphaZero-like tree-search can guide large language model decoding and training. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st Interna- tional Conf...

  34. [34]

    Deepseek-r1 incentivizes reasoning in llms through reinforcement learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645(8081):633–638, 2025

  35. [35]

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv e-prints , pages arXiv–2402, 2024

  36. [36]

    Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A

    Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James Validad Miranda, Alisa Liu, Nouha Dziri, Xinxi Lyu, Yul- ing Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannan...

  37. [37]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

  38. [38]

    How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025

    Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R Butt, Mohammad Taha Khan, and Muhammad Ali Gulzar. How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025

  39. [39]

    A practical introduction to tensor networks: Matrix product states and projected entangled pair states

    Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics , 349:117–158, 2014. 26

  40. [40]

    Solving the quantum many-body problem with artificial neural networks

    Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017

  41. [41]

    Application of large language models to quantum state simulation

    Shuangxiang Zhou, Ronghang Chen, Zheng An, Chao Zhang, and Shi-Yao Hou. Application of large language models to quantum state simulation. Science China Physics, Mechanics & Astronomy, 68(4):240313, 2025

  42. [42]

    A fast quantum mechanical algorithm for database search

    Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , pages 212–219, 1996

  43. [43]

    Overcoming catastrophic forgetting in neural networks

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences , 114(13):3521–3526, 2017

  44. [44]

    Stabilizing transformer training by preventing attention entropy collapse

    Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M Susskind. Stabilizing transformer training by preventing attention entropy collapse. In International Conference on Machine Learning , pages 40770–40803. PMLR, 2023

  45. [45]

    Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset

    Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, and Igor Gitman. Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset. arXiv preprint arXiv:2504.16891, 2025

  46. [46]

    Deepseek-coder: When the large language model meets programming–the rise of code intelligence

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196 , 2024

  47. [47]

    Opencodereasoning: Advancing data distillation for competitive coding

    Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Sid- dhartha Jain, Jocelyn Huang, Vahid Noroozi, and Boris Ginsburg. Opencodereasoning: Advancing data distillation for competitive coding. In Second Conference on Language Modeling, 2025

  48. [48]

    Hybridflow: A flexible and efficient rlhf framework

    Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. In Proceedings of the Twentieth European Conference on Computer Systems, pages 1279–1297, 2025

  49. [49]

    Qwen3 technical report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

  50. [50]

    gpt-oss-120b & gpt-oss-20b model card

    Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925 , 2025

  51. [51]

    Exploring length generalization in large language models

    Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ra- masesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022

  52. [52]

    Length generalization in arithmetic transformers

    Samy Jelassi, Stéphane d’Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, and François Charton. Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400, 2023. 27

  53. [53]

    How are you?

    Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. In International Conference on Learning Representations , volume 2024, pages 15898–15926, 2024. 28 A SFT Training Configuration Configuration V alue Base model Qwen...

  54. [54]

    RZ(−3π/4) gate on qubit 2

    circuit.rz(-3*pi/4, 2) . RZ(−3π/4) gate on qubit 2. <quantum_state> [0.38 + 0.92i, 0, 0, 0, 0, 0, 0, 0] </quantum_state>

  55. [55]

    RY(−π/4) gate on qubit 1

    circuit.ry(-pi/4, 1) . RY(−π/4) gate on qubit 1. <quantum_state> [0.35 + 0.85i, 0, −0.15 − 0.35i, 0, 0, 0, 0, 0] </quantum_state>

  56. [56]

    RY(3π/4) gate on qubit 0

    circuit.ry(3*pi/4, 0) . RY(3π/4) gate on qubit 0. <quantum_state> [0.14 + 0 .33i, 0, −0.06 − 0.14i, 0, 0.33 + 0 .79i, 0, −0.14 − 0.33i, 0] </quantum_state> C GRPO Training Configuration Configuration V alue LoRA rank 16 LoRA alpha 32 LoRA Target modules All linear layers Training batch size 64 Rollouts per sample 5 Rollouts Temperature 1.0 Rollouts Top-p ...

  57. [57]

    State becomes: 1/√2 (|000￿ + |010￿)

    h(1) → apply H to qubit 1. State becomes: 1/√2 (|000￿ + |010￿)

  58. [58]

    Now the state is 1/2 (|000￿ + |010￿ + |100￿ + |110￿) [...] 30

    h(0) → apply H to qubit 0. Now the state is 1/2 (|000￿ + |010￿ + |100￿ + |110￿) [...] 30