Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

Bing Zhao; Chuan Xiao; Hu Wei; Linfeng Zhang; Lin Qu; Shaobo Wang; Wei Wang; Zhengbo Jiao

arxiv: 2606.07412 · v1 · pith:Y5RPZWCKnew · submitted 2026-06-05 · 💻 cs.SE · cs.AI

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

Chuan Xiao , Zhengbo Jiao , Shaobo Wang , Wei Wang , Bing Zhao , Hu Wei , Linfeng Zhang , Lin Qu This is my paper

Pith reviewed 2026-06-27 21:14 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords self-evolving agentsLLM coding agentstrace distillationagent skillsSWE-benchself-improvementsoftware engineering agentssolver-gradient alignment

0 comments

The pith

Socratic-SWE turns an agent's own solving traces into skills that generate tasks improving its performance on software engineering benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a self-evolution framework for coding agents that reuses historical traces to create better training data. Instead of fixed methods for making tasks, it distills traces into skills that capture common failures and fixes, then generates new tasks in real code repositories. These tasks are validated by running them and scored to match what the current solver needs to improve. This loop allows the agent to adapt its training over multiple rounds. Readers would care because it offers a way to scale high-quality tasks based on the agent's actual weaknesses rather than generic procedures.

Core claim

Socratic-SWE is a closed-loop self-evolution framework that distills the agent's historical solving traces into structured agent skills summarizing recurring failures and effective repair patterns. These skills guide the generation of targeted repair tasks in real repositories. The tasks undergo execution-based validation and are scored with a solver-gradient alignment reward to ensure they are useful for improving the Solver. The updated Solver then produces new traces, enabling the task curriculum to adapt over successive rounds and achieve performance gains such as 50.40% on SWE-bench Verified after three iterations.

What carries the argument

Structured agent skills distilled from solving traces, combined with a solver-gradient alignment reward for selecting useful training tasks.

If this is right

The agent generates tasks tailored to its current weaknesses rather than using fixed mutation procedures.
Iterative updates lead to consistent improvements over self-evolving baselines on SWE-bench variants and Terminal-Bench.
Solving traces become a reusable substrate for ongoing self-evolution of the agent.
Performance reaches 50.40% on SWE-bench Verified after three iterations under the same compute budget.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be applied to other domains where agents can generate their own training signals from interaction histories.
The approach highlights the potential for agents to bootstrap their capabilities without external task curation.
Combining trace distillation with other reward signals might further accelerate the self-evolution process.

Load-bearing premise

Distilling traces into skills produces tasks where the alignment scores truly predict improvement in the solver rather than just echoing its current behavior.

What would settle it

If tasks selected via the alignment reward produce no measurable solver gains when used for training in the next iteration, the central claim would not hold.

read the original abstract

LLM-driven software engineering agents have become a central testbed for real-world language-model capability, yet their training remains limited by the availability of high-quality SWE tasks. Existing synthetic data methods typically create tasks through fixed mutation or bug-injection procedures, making the resulting distributions largely independent of the agent's own weaknesses and training progress. We introduce Socratic-SWE, a closed-loop self-evolution framework that reuses the agent's historical solving traces as a source of training signal. Rather than treating traces only as evidence for reward computation, Socratic-SWE distills them into structured agent skills that summarize recurring failures and effective repair patterns. These skills then guide the generation of targeted repair tasks in real repositories. Candidate tasks are checked through execution-based validation and scored with a solver-gradient alignment reward, so that the retained tasks are both verifiable and useful for improving the Solver. The updated Solver produces new traces, enabling the task curriculum to adapt over successive rounds. Across SWE-bench Verified, SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0, Socratic-SWE consistently improves over self-evolving baselines under the same compute budget, reaching 50.40% on SWE-bench Verified after three iterations. These results suggest that solving traces can serve as a scalable substrate for self-evolving SWE agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Socratic-SWE turns traces into skills for adaptive task generation and reports gains on SWE-benches, but the alignment reward has no shown link to actual future improvement.

read the letter

The main new element is distilling solving traces into structured skills that capture recurring failures and repairs, then using those skills to generate targeted tasks in real repositories. This makes the task curriculum depend on the agent's own history rather than fixed mutation rules.

The paper does a clean job describing the closed loop—traces to skills to validated tasks to updated solver—and states consistent gains over self-evolving baselines on SWE-bench Verified, Lite, Pro, and Terminal-Bench 2.0 under matched compute, with a peak of 50.40% after three rounds.

The soft spot is the solver-gradient alignment reward. It is defined in terms of the same solver whose improvement is the goal, and the abstract gives no correlation check or ablation showing that high-scoring tasks predict later gains rather than just echoing the solver's current patterns. Without that, the reward could be selecting tasks inside the distribution the solver already handles in a particular way. Execution validation is mentioned, but the reward's added value is not isolated.

No load-bearing contradictions appear in the description, and the approach is straightforward. The citation pattern follows prior synthetic-data work for agents.

This is for people building self-evolving SWE agents. A reader focused on adaptive curricula would get value from the skill-distillation step. It deserves a serious referee so the experiments and reward validation can be examined in full.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Socratic-SWE, a closed-loop self-evolution framework for LLM-driven SWE agents. It reuses historical solving traces to distill structured agent skills that summarize failures and repair patterns, generates candidate repair tasks in real repositories, validates them via execution, and scores them with a solver-gradient alignment reward to retain tasks useful for solver improvement. The updated solver produces new traces, enabling iterative curriculum adaptation. The paper reports consistent gains over self-evolving baselines under fixed compute on SWE-bench Verified (50.40% after three iterations), SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0.

Significance. If the performance claims and the effectiveness of the trace-derived skills plus alignment reward hold, the work provides a scalable mechanism for generating adaptive training data that tracks an agent's evolving weaknesses, addressing a key limitation of fixed synthetic-data methods. The closed-loop design, execution-based validation, and reuse of traces as substrate are notable strengths that could generalize beyond the reported benchmarks.

major comments (2)

[Abstract] Abstract: The solver-gradient alignment reward is described as selecting tasks 'useful for improving the Solver,' but the manuscript supplies no correlation analysis between alignment scores and subsequent iteration gains, nor an ablation comparing it to failure-rate-only selection. This validation is load-bearing for the central claim that the reward produces a curriculum expanding the effective training distribution rather than merely resampling the current solver's idiosyncrasies.
[Results] Results (performance tables): The reported improvements (e.g., 50.40% on SWE-bench Verified) are presented without statistical significance tests, run-to-run variance, or explicit confirmation that baselines received identical compute budgets and iteration counts, which is required to substantiate the 'consistently improves' claim across the four benchmarks.

minor comments (1)

[Abstract] Notation for 'solver-gradient alignment reward' is introduced without an explicit equation or pseudocode definition in the abstract; a compact formalization would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. The two major comments identify important gaps in validation and statistical reporting. We address each point below and commit to revisions that directly strengthen the central claims without altering the reported experimental outcomes.

read point-by-point responses

Referee: [Abstract] Abstract: The solver-gradient alignment reward is described as selecting tasks 'useful for improving the Solver,' but the manuscript supplies no correlation analysis between alignment scores and subsequent iteration gains, nor an ablation comparing it to failure-rate-only selection. This validation is load-bearing for the central claim that the reward produces a curriculum expanding the effective training distribution rather than merely resampling the current solver's idiosyncrasies.

Authors: We agree that the absence of these analyses leaves the contribution of the alignment reward under-supported. In the revised manuscript we will add (1) an ablation that replaces the alignment term with failure-rate-only selection while keeping all other components fixed, and (2) a correlation plot and coefficient between per-task alignment scores and the delta in solver performance on the same task distribution in the subsequent iteration. These additions will be placed in the experimental analysis section and will use the same compute budget as the original runs. revision: yes
Referee: [Results] Results (performance tables): The reported improvements (e.g., 50.40% on SWE-bench Verified) are presented without statistical significance tests, run-to-run variance, or explicit confirmation that baselines received identical compute budgets and iteration counts, which is required to substantiate the 'consistently improves' claim across the four benchmarks.

Authors: The experimental protocol already enforces identical compute budgets and iteration counts for all methods, including baselines; we will make this explicit in the revised text. We will also add (a) standard deviation across three independent runs for the primary metric on each benchmark and (b) paired statistical significance tests (e.g., Wilcoxon signed-rank) between Socratic-SWE and the strongest baseline at each iteration. These numbers will be reported in updated tables and the accompanying text. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical self-evolution loop that distills traces into skills, generates candidate tasks, applies execution-based validation plus a solver-gradient alignment reward, and measures gains on external benchmarks (SWE-bench Verified, Lite, Pro, Terminal-Bench 2.0) against self-evolving baselines under fixed compute. No equations, parameter definitions, or self-citations appear in the supplied text that reduce the reward, the task selection, or the reported performance numbers to a tautological restatement of the inputs by construction. The central claim therefore rests on observable iteration-over-iteration improvement on held-out suites rather than on any self-referential identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities can be extracted beyond the high-level description of skills and the solver-gradient alignment reward.

pith-pipeline@v0.9.1-grok · 5781 in / 1101 out tokens · 13542 ms · 2026-06-27T21:14:42.684329+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, and Sida I. Wang. SWE-RL: Advancing LLM reasoning via reinforcement learning on open software evolution. InAdvancesin Neural Information Processing Systems 38, 2025

2025
[2]

Training software engineering agents and verifiers with SWE-Gym

Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, and Yizhe Zhang. Training software engineering agents and verifiers with SWE-Gym. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 47717–47737. PMLR, 2025

2025
[3]

Jimenez, Alexander Wettig, Kabir Khandpur, Yanzhe Zhang, Binyuan Hui, Ofir Press, Ludwig Schmidt, and Diyi Yang

John Yang, Kilian Lieret, Carlos E. Jimenez, Alexander Wettig, Kabir Khandpur, Yanzhe Zhang, Binyuan Hui, Ofir Press, Ludwig Schmidt, and Diyi Yang. SWE-smith: Scaling data for software engineering agents. In Advancesin Neural Information Processing Systems 38, 2025

2025
[4]

Self-supervised bug detection and repair

Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. Self-supervised bug detection and repair. In Advancesin Neural Information Processing Systems 34, 2021

2021
[5]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024

2024
[6]

Group-in-group policy optimization for llm agent training

Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for llm agent training. Advancesin Neural Information Processing Systems, 38:46375–46408, 2026

2026
[7]

Agentic reinforce- ment learning with implicit step rewards.arXiv preprint arXiv:2509.19199, 2025

Xiaoqian Liu, Ke Wang, Yuchuan Wu, Fei Huang, Yongbin Li, Junge Zhang, and Jianbin Jiao. Agentic reinforce- ment learning with implicit step rewards.arXiv preprint arXiv:2509.19199, 2025

work page arXiv 2025
[8]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InInternational Conference on Learning Representations, volume 2024, pages 39578–39601, 2024

2024
[9]

R-Zero: Self-evolving reasoning LLM from zero data

Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, and Dong Yu. R-Zero: Self-evolving reasoning LLM from zero data. InInternational Conference on Learning Representations, 2026

2026
[10]

Socratic-Zero: Bootstrapping reasoning via data-free agent co-evolution, 2025

Shaobo Wang, Zhengbo Jiao, Zifan Zhang, Yilang Peng, Xu Ze, Boyu Yang, Wei Wang, Hu Wei, and Linfeng Zhang. Socratic-Zero: Bootstrapping reasoning via data-free agent co-evolution, 2025

2025
[11]

Absolute zero: Reinforced self-play reasoning with zero data

Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, and Gao Huang. Absolute zero: Reinforced self-play reasoning with zero data. InAdvances in Neural Information Processing Systems 38, 2025

2025
[12]

SkillRL: Evolving agents via recursive skill- augmented reinforcement learning

Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, and Huaxiu Yao. SkillRL: Evolving agents via recursive skill- augmented reinforcement learning. InICLR 2026 Workshopon Lifelong Learning Agents, 2026

2026
[13]

SKILL0: In-context agentic reinforcement learning for skill internalization, 2026

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. SKILL0: In-context agentic reinforcement learning for skill internalization, 2026

2026
[14]

OpenAI o1 system card, 2024

OpenAI. OpenAI o1 system card, 2024

2024
[15]

Dapo: An open-source llm reinforcement learning system at scale.Advances in Neural Information Processing Systems, 38:113222–113244, 2026

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.Advances in Neural Information Processing Systems, 38:113222–113244, 2026

2026
[16]

Group Sequence Policy Optimization

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Soft Adaptive Policy Optimization

Chang Gao, Chujie Zheng, Xiong-Hui Chen, Kai Dang, Shixuan Liu, Bowen Yu, An Yang, Shuai Bai, Jingren Zhou, and Junyang Lin. Soft adaptive policy optimization.arXiv preprint arXiv:2511.20347, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu- ChiangFrankWang, Kwang-TingCheng, etal. Gdpo: Groupreward-decouplednormalizationpolicyoptimization for multi-reward rl optimization.arXiv preprint arXiv:2601.05242, 2026. 11

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

Agentic reinforced policy optimization

Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, and Zhicheng Dou. Agentic reinforced policy optimization. InInternational Conference on Learning Representations, 2026

2026
[20]

Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E

Shiyi Cao, Dacheng Li, Fangzhou Zhao, Shuo Yuan, Sumanth R. Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. SkyRL-Agent: Efficient RL training for multi-turn LLM agent, 2025

2025
[21]

Opus: Towards efficient and principled data selection in large language model pre-training in every iteration, 2026

Shaobo Wang, Xuan Ouyang, Tianyi Xu, Yuzheng Hu, Jialin Liu, Guo Chen, Tianyu Zhang, Junhao Zheng, Kexin Yang, Xingzhang Ren, Dayiheng Liu, and Linfeng Zhang. Opus: Towards efficient and principled data selection in large language model pre-training in every iteration, 2026

2026
[22]

GradAlign: Gradient-aligned data selection for LLM reinforcement learning, 2026

Ningyuan Yang, Weihua Du, Weiwei Sun, Sean Welleck, and Yiming Yang. GradAlign: Gradient-aligned data selection for LLM reinforcement learning, 2026

2026
[23]

OptimSyn: Influence-guided rubrics optimization for synthetic data generation

Zhiting Fan, Ruizhe Chen, Tianxiang Hu, Ru Peng, Zenan Huang, Haokai Xu, Yixin Chen, Jian Wu, Junbo Zhao, and Zuozhu Liu. OptimSyn: Influence-guided rubrics optimization for synthetic data generation. InInternational Conference on Learning Representations, 2026

2026
[24]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world github issues? InInternational Conference on Learning Representations, 2024

2024
[25]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems 37, 2024

2024
[26]

Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. OpenHands: An open platform for AI soft...

2025
[27]

Demystifying LLM-based software engi- neering agents.Proceedings of the ACM on Software Engineering, 2(FSE):801–824, 2025

Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. Demystifying LLM-based software engi- neering agents.Proceedings of the ACM on Software Engineering, 2(FSE):801–824, 2025

2025
[28]

SWE-Master: Un- leashing the potential of software engineering agents via post-training, 2026

Huatong Song, Lisheng Huang, Shuang Sun, Jinhao Jiang, Ran Le, Daixuan Cheng, Guoxin Chen, Yiwen Hu, Zongchao Chen, Yiming Jia, Wayne Xin Zhao, Yang Song, Tao Zhang, and Ji-Rong Wen. SWE-Master: Un- leashing the potential of software engineering agents via post-training, 2026

2026
[29]

Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Ling- ming Zhang, and Sida I. Wang. Toward training superintelligent software agents through self-play SWE-RL, 2025

2025
[30]

TTRL: Test-time reinforcement learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, Biqing Qi, Youbang Sun, Zhiyuan Ma, Lifan Yuan, Ning Ding, and Bowen Zhou. TTRL: Test-time reinforcement learning. InAdvancesin Neural Information Processing Systems 38, 2025

2025
[31]

Spiral: Self-playonzero-sumgamesincentivizesreasoningviamulti-agentmulti-turnreinforcement learning

Bo Liu, Leon Guertler, Simon Yu, Zichen Liu, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, MinLin, etal. Spiral: Self-playonzero-sumgamesincentivizesreasoningviamulti-agentmulti-turnreinforcement learning. arXiv preprint arXiv:2506.24119, 2025

work page arXiv 2025
[32]

Socratic-Geo: Synthetic data generation and geometric reasoning via multi-agent interaction, 2026

Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Wei Wang, Bing Zhao, Hu Wei, and Linfeng Zhang. Socratic-Geo: Synthetic data generation and geometric reasoning via multi-agent interaction, 2026

2026
[33]

SpatialEvo: Self-evolving spatial intelligence via deterministic geometric environments, 2026

Dinging Li, Yingxiu Zhao, Xinrui Cheng, Kangheng Lin, Hongbo Peng, Hongxing Li, Zixuan Wang, Yuhong Dai, Haodong Li, Jia Wang, Yukang Shi, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. SpatialEvo: Self-evolving spatial intelligence via deterministic geometric environments, 2026

2026
[34]

Agentic proposing: Enhancing large language model reasoning via compositional skill synthesis, 2026

Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Xuan Ren, Wei Wang, Bing Zhao, Hu Wei, and Linfeng Zhang. Agentic proposing: Enhancing large language model reasoning via compositional skill synthesis, 2026

2026
[35]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026

2026
[36]

Qwen3.6-27B: Flagship-level coding in a 27B dense model, April 2026

Qwen Team. Qwen3.6-27B: Flagship-level coding in a 27B dense model, April 2026. 12

2026
[37]

Beyondswe: Can current code agent survive beyond single-repo bug fixing?, 2026

Guoxin Chen, Fanzhe Meng, Jiale Zhao, Minghao Li, Daixuan Cheng, Huatong Song, Jie Chen, Yuzhi Lin, Hui Chen, Xin Zhao, Ruihua Song, Chang Liu, Cheng Chen, Kai Jia, and Ji-Rong Wen. Beyondswe: Can current code agent survive beyond single-repo bug fixing?, 2026

2026
[38]

SWE-Bench Pro: Can AI agents solve long-horizon software engineering tasks?, 2025

Xiang Deng, Jeff Da, Edwin Pan, Yannis Yiming He, Charles Ide, Kanak Garg, Niklas Lauffer, Andrew Park, Nitin Pasari, Chetan Rane, Karmini Sampath, Maya Krishnan, Srivatsa Kundurthy, Sean Hendryx, Zifan Wang, Vijay Bharadwaj, Jeff Holm, Raja Aluri, Chen Bo Calvin Zhang, Noah Jacobson, Bing Liu, and Brad Kenstler. SWE-Bench Pro: Can AI agents solve long-ho...

2025
[39]

Merrill et al

Mike A. Merrill et al. Terminal-Bench: Benchmarking agents on hard, realistic tasks in command line interfaces, 2026

2026
[40]

mini-SWE-agent: The minimal AI software engineering agent.https://github.com/ SWE-agent/mini-swe-agent, 2025

SWE-agent Team. mini-SWE-agent: The minimal AI software engineering agent.https://github.com/ SWE-agent/mini-swe-agent, 2025

2025
[41]

little-coder: A coding agent optimized for small local language models.https://open.substack.com/ pub/itayinbarr/p/honey-i-shrunk-the-coding-agent, April 2026

Itay Inbar. little-coder: A coding agent optimized for small local language models.https://open.substack.com/ pub/itayinbarr/p/honey-i-shrunk-the-coding-agent, April 2026. White paper

2026
[42]

Spice: Self-play in corpus environments improves reasoning, 2025

BoLiu, ChuanyangJin, SeungoneKim, WeizheYuan, WentingZhao, IliaKulikov, XianLi, SainbayarSukhbaatar, Jack Lanchantin, and Jason Weston. Spice: Self-play in corpus environments improves reasoning, 2025. 13 Algorithm 1Socratic-SWE: Self-Play in Repository Environments Require:Shared policyπθ; repository corpusR; Agent Skill RegistryS; curriculumD0; batch si...

2025
[43]

Inject exactly one atomic semantic mistake in one production file
[44]

Do not introduce syntax errors, import errors, or changes that prevent the module from loading
[45]

The bug must be reversible: the original code is the reference fix
[46]

Keep the diff minimal and free of unrelated cleanup. 18
[47]

</BUG_INJECTION_RULES> <WORKFLOW>

Do not add comments, logs, TODOs, or variable names that reveal the bug. </BUG_INJECTION_RULES> <WORKFLOW>
[48]

Inspect the repository and identify a plausible target location
[49]

Identify visible tests or behavior that should expose the injected bug
[50]

State the intended semantic change before editing
[51]

Make one contiguous source-code edit using Bash-accessible file operations
[52]

Run the relevant visible test(s) and confirm that the target behavior fails
[53]

Run collateral checks when feasible to avoid broad breakage
[54]

</WORKFLOW> G.2 Mini-SWE-agent Prompt System Prompt You are a helpful assistant that can interact with a computer shell to solve programming tasks

Inspect git diff and stop once a clean single-bug diff is obtained. </WORKFLOW> G.2 Mini-SWE-agent Prompt System Prompt You are a helpful assistant that can interact with a computer shell to solve programming tasks. Base T ask Instructions (Shortened) Given a task description, the agent interacts with a Linux shell in /testbed to make the required source-...
[55]

inspect the repository and identify relevant files
[56]

reproduce or understand the issue when possible
[57]

modify only source files needed for the task
[58]

verify the change by running visible checks when available
[59]

test edge cases when feasible
[60]

leave a clean git diff containing only the intended changes
[61]

The agent should not modify tests, generated files, build artifacts, or unrelated configuration files unless they are directly required by the task

finish according to the configured mini-swe-agent completion protocol. The agent should not modify tests, generated files, build artifacts, or unrelated configuration files unless they are directly required by the task. G.3 Base Mini-SWE-agent Prompt Solver T ask Prompt T emplate Fix the issue described below. <ISSUE> {{ problem_statement }} </ISSUE> <TAS...
[62]

Scope conversion: the Solver localizes list_to_scope or scope_to_list but uses generic filtering, sorting, or normalization that violates OAuthLib's helper contracts
[63]

Constructor storage: endpoint and client constructors contain same-typed parameters, causing the Solver to swap fields, transform values, or store parameters under the wrong private attributes
[64]

OAuth1 plumbing: nonce, timestamp, realm, and callback_uri are all 20 string-like but semantically different, so type information alone is insufficient
[65]

The Solver may replace inherited behavior instead of extending it, or forward attributes to the wrong object

OIDC inheritance: OpenID Connect grants inherit from or delegate to OAuth2 grant logic. The Solver may replace inherited behavior instead of extending it, or forward attributes to the wrong object. Representative traces include failures in scope utility tests, device endpoint tests, OAuth1 client/signature tests, and OpenID Connect grant-type tests. ## 3....

[1] [1]

Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, and Sida I. Wang. SWE-RL: Advancing LLM reasoning via reinforcement learning on open software evolution. InAdvancesin Neural Information Processing Systems 38, 2025

2025

[2] [2]

Training software engineering agents and verifiers with SWE-Gym

Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, and Yizhe Zhang. Training software engineering agents and verifiers with SWE-Gym. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 47717–47737. PMLR, 2025

2025

[3] [3]

Jimenez, Alexander Wettig, Kabir Khandpur, Yanzhe Zhang, Binyuan Hui, Ofir Press, Ludwig Schmidt, and Diyi Yang

John Yang, Kilian Lieret, Carlos E. Jimenez, Alexander Wettig, Kabir Khandpur, Yanzhe Zhang, Binyuan Hui, Ofir Press, Ludwig Schmidt, and Diyi Yang. SWE-smith: Scaling data for software engineering agents. In Advancesin Neural Information Processing Systems 38, 2025

2025

[4] [4]

Self-supervised bug detection and repair

Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. Self-supervised bug detection and repair. In Advancesin Neural Information Processing Systems 34, 2021

2021

[5] [5]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024

2024

[6] [6]

Group-in-group policy optimization for llm agent training

Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for llm agent training. Advancesin Neural Information Processing Systems, 38:46375–46408, 2026

2026

[7] [7]

Agentic reinforce- ment learning with implicit step rewards.arXiv preprint arXiv:2509.19199, 2025

Xiaoqian Liu, Ke Wang, Yuchuan Wu, Fei Huang, Yongbin Li, Junge Zhang, and Jianbin Jiao. Agentic reinforce- ment learning with implicit step rewards.arXiv preprint arXiv:2509.19199, 2025

work page arXiv 2025

[8] [8]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InInternational Conference on Learning Representations, volume 2024, pages 39578–39601, 2024

2024

[9] [9]

R-Zero: Self-evolving reasoning LLM from zero data

Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, and Dong Yu. R-Zero: Self-evolving reasoning LLM from zero data. InInternational Conference on Learning Representations, 2026

2026

[10] [10]

Socratic-Zero: Bootstrapping reasoning via data-free agent co-evolution, 2025

Shaobo Wang, Zhengbo Jiao, Zifan Zhang, Yilang Peng, Xu Ze, Boyu Yang, Wei Wang, Hu Wei, and Linfeng Zhang. Socratic-Zero: Bootstrapping reasoning via data-free agent co-evolution, 2025

2025

[11] [11]

Absolute zero: Reinforced self-play reasoning with zero data

Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, and Gao Huang. Absolute zero: Reinforced self-play reasoning with zero data. InAdvances in Neural Information Processing Systems 38, 2025

2025

[12] [12]

SkillRL: Evolving agents via recursive skill- augmented reinforcement learning

Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, and Huaxiu Yao. SkillRL: Evolving agents via recursive skill- augmented reinforcement learning. InICLR 2026 Workshopon Lifelong Learning Agents, 2026

2026

[13] [13]

SKILL0: In-context agentic reinforcement learning for skill internalization, 2026

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. SKILL0: In-context agentic reinforcement learning for skill internalization, 2026

2026

[14] [14]

OpenAI o1 system card, 2024

OpenAI. OpenAI o1 system card, 2024

2024

[15] [15]

Dapo: An open-source llm reinforcement learning system at scale.Advances in Neural Information Processing Systems, 38:113222–113244, 2026

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.Advances in Neural Information Processing Systems, 38:113222–113244, 2026

2026

[16] [16]

Group Sequence Policy Optimization

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Soft Adaptive Policy Optimization

Chang Gao, Chujie Zheng, Xiong-Hui Chen, Kai Dang, Shixuan Liu, Bowen Yu, An Yang, Shuai Bai, Jingren Zhou, and Junyang Lin. Soft adaptive policy optimization.arXiv preprint arXiv:2511.20347, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu- ChiangFrankWang, Kwang-TingCheng, etal. Gdpo: Groupreward-decouplednormalizationpolicyoptimization for multi-reward rl optimization.arXiv preprint arXiv:2601.05242, 2026. 11

work page internal anchor Pith review Pith/arXiv arXiv 2026

[19] [19]

Agentic reinforced policy optimization

Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, and Zhicheng Dou. Agentic reinforced policy optimization. InInternational Conference on Learning Representations, 2026

2026

[20] [20]

Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E

Shiyi Cao, Dacheng Li, Fangzhou Zhao, Shuo Yuan, Sumanth R. Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. SkyRL-Agent: Efficient RL training for multi-turn LLM agent, 2025

2025

[21] [21]

Opus: Towards efficient and principled data selection in large language model pre-training in every iteration, 2026

Shaobo Wang, Xuan Ouyang, Tianyi Xu, Yuzheng Hu, Jialin Liu, Guo Chen, Tianyu Zhang, Junhao Zheng, Kexin Yang, Xingzhang Ren, Dayiheng Liu, and Linfeng Zhang. Opus: Towards efficient and principled data selection in large language model pre-training in every iteration, 2026

2026

[22] [22]

GradAlign: Gradient-aligned data selection for LLM reinforcement learning, 2026

Ningyuan Yang, Weihua Du, Weiwei Sun, Sean Welleck, and Yiming Yang. GradAlign: Gradient-aligned data selection for LLM reinforcement learning, 2026

2026

[23] [23]

OptimSyn: Influence-guided rubrics optimization for synthetic data generation

Zhiting Fan, Ruizhe Chen, Tianxiang Hu, Ru Peng, Zenan Huang, Haokai Xu, Yixin Chen, Jian Wu, Junbo Zhao, and Zuozhu Liu. OptimSyn: Influence-guided rubrics optimization for synthetic data generation. InInternational Conference on Learning Representations, 2026

2026

[24] [24]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world github issues? InInternational Conference on Learning Representations, 2024

2024

[25] [25]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems 37, 2024

2024

[26] [26]

Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. OpenHands: An open platform for AI soft...

2025

[27] [27]

Demystifying LLM-based software engi- neering agents.Proceedings of the ACM on Software Engineering, 2(FSE):801–824, 2025

Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. Demystifying LLM-based software engi- neering agents.Proceedings of the ACM on Software Engineering, 2(FSE):801–824, 2025

2025

[28] [28]

SWE-Master: Un- leashing the potential of software engineering agents via post-training, 2026

Huatong Song, Lisheng Huang, Shuang Sun, Jinhao Jiang, Ran Le, Daixuan Cheng, Guoxin Chen, Yiwen Hu, Zongchao Chen, Yiming Jia, Wayne Xin Zhao, Yang Song, Tao Zhang, and Ji-Rong Wen. SWE-Master: Un- leashing the potential of software engineering agents via post-training, 2026

2026

[29] [29]

Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Ling- ming Zhang, and Sida I. Wang. Toward training superintelligent software agents through self-play SWE-RL, 2025

2025

[30] [30]

TTRL: Test-time reinforcement learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, Biqing Qi, Youbang Sun, Zhiyuan Ma, Lifan Yuan, Ning Ding, and Bowen Zhou. TTRL: Test-time reinforcement learning. InAdvancesin Neural Information Processing Systems 38, 2025

2025

[31] [31]

Spiral: Self-playonzero-sumgamesincentivizesreasoningviamulti-agentmulti-turnreinforcement learning

Bo Liu, Leon Guertler, Simon Yu, Zichen Liu, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, MinLin, etal. Spiral: Self-playonzero-sumgamesincentivizesreasoningviamulti-agentmulti-turnreinforcement learning. arXiv preprint arXiv:2506.24119, 2025

work page arXiv 2025

[32] [32]

Socratic-Geo: Synthetic data generation and geometric reasoning via multi-agent interaction, 2026

Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Wei Wang, Bing Zhao, Hu Wei, and Linfeng Zhang. Socratic-Geo: Synthetic data generation and geometric reasoning via multi-agent interaction, 2026

2026

[33] [33]

SpatialEvo: Self-evolving spatial intelligence via deterministic geometric environments, 2026

Dinging Li, Yingxiu Zhao, Xinrui Cheng, Kangheng Lin, Hongbo Peng, Hongxing Li, Zixuan Wang, Yuhong Dai, Haodong Li, Jia Wang, Yukang Shi, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. SpatialEvo: Self-evolving spatial intelligence via deterministic geometric environments, 2026

2026

[34] [34]

Agentic proposing: Enhancing large language model reasoning via compositional skill synthesis, 2026

Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Xuan Ren, Wei Wang, Bing Zhao, Hu Wei, and Linfeng Zhang. Agentic proposing: Enhancing large language model reasoning via compositional skill synthesis, 2026

2026

[35] [35]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026

2026

[36] [36]

Qwen3.6-27B: Flagship-level coding in a 27B dense model, April 2026

Qwen Team. Qwen3.6-27B: Flagship-level coding in a 27B dense model, April 2026. 12

2026

[37] [37]

Beyondswe: Can current code agent survive beyond single-repo bug fixing?, 2026

Guoxin Chen, Fanzhe Meng, Jiale Zhao, Minghao Li, Daixuan Cheng, Huatong Song, Jie Chen, Yuzhi Lin, Hui Chen, Xin Zhao, Ruihua Song, Chang Liu, Cheng Chen, Kai Jia, and Ji-Rong Wen. Beyondswe: Can current code agent survive beyond single-repo bug fixing?, 2026

2026

[38] [38]

SWE-Bench Pro: Can AI agents solve long-horizon software engineering tasks?, 2025

Xiang Deng, Jeff Da, Edwin Pan, Yannis Yiming He, Charles Ide, Kanak Garg, Niklas Lauffer, Andrew Park, Nitin Pasari, Chetan Rane, Karmini Sampath, Maya Krishnan, Srivatsa Kundurthy, Sean Hendryx, Zifan Wang, Vijay Bharadwaj, Jeff Holm, Raja Aluri, Chen Bo Calvin Zhang, Noah Jacobson, Bing Liu, and Brad Kenstler. SWE-Bench Pro: Can AI agents solve long-ho...

2025

[39] [39]

Merrill et al

Mike A. Merrill et al. Terminal-Bench: Benchmarking agents on hard, realistic tasks in command line interfaces, 2026

2026

[40] [40]

mini-SWE-agent: The minimal AI software engineering agent.https://github.com/ SWE-agent/mini-swe-agent, 2025

SWE-agent Team. mini-SWE-agent: The minimal AI software engineering agent.https://github.com/ SWE-agent/mini-swe-agent, 2025

2025

[41] [41]

little-coder: A coding agent optimized for small local language models.https://open.substack.com/ pub/itayinbarr/p/honey-i-shrunk-the-coding-agent, April 2026

Itay Inbar. little-coder: A coding agent optimized for small local language models.https://open.substack.com/ pub/itayinbarr/p/honey-i-shrunk-the-coding-agent, April 2026. White paper

2026

[42] [42]

Spice: Self-play in corpus environments improves reasoning, 2025

BoLiu, ChuanyangJin, SeungoneKim, WeizheYuan, WentingZhao, IliaKulikov, XianLi, SainbayarSukhbaatar, Jack Lanchantin, and Jason Weston. Spice: Self-play in corpus environments improves reasoning, 2025. 13 Algorithm 1Socratic-SWE: Self-Play in Repository Environments Require:Shared policyπθ; repository corpusR; Agent Skill RegistryS; curriculumD0; batch si...

2025

[43] [43]

Inject exactly one atomic semantic mistake in one production file

[44] [44]

Do not introduce syntax errors, import errors, or changes that prevent the module from loading

[45] [45]

The bug must be reversible: the original code is the reference fix

[46] [46]

Keep the diff minimal and free of unrelated cleanup. 18

[47] [47]

</BUG_INJECTION_RULES> <WORKFLOW>

Do not add comments, logs, TODOs, or variable names that reveal the bug. </BUG_INJECTION_RULES> <WORKFLOW>

[48] [48]

Inspect the repository and identify a plausible target location

[49] [49]

Identify visible tests or behavior that should expose the injected bug

[50] [50]

State the intended semantic change before editing

[51] [51]

Make one contiguous source-code edit using Bash-accessible file operations

[52] [52]

Run the relevant visible test(s) and confirm that the target behavior fails

[53] [53]

Run collateral checks when feasible to avoid broad breakage

[54] [54]

</WORKFLOW> G.2 Mini-SWE-agent Prompt System Prompt You are a helpful assistant that can interact with a computer shell to solve programming tasks

Inspect git diff and stop once a clean single-bug diff is obtained. </WORKFLOW> G.2 Mini-SWE-agent Prompt System Prompt You are a helpful assistant that can interact with a computer shell to solve programming tasks. Base T ask Instructions (Shortened) Given a task description, the agent interacts with a Linux shell in /testbed to make the required source-...

[55] [55]

inspect the repository and identify relevant files

[56] [56]

reproduce or understand the issue when possible

[57] [57]

modify only source files needed for the task

[58] [58]

verify the change by running visible checks when available

[59] [59]

test edge cases when feasible

[60] [60]

leave a clean git diff containing only the intended changes

[61] [61]

The agent should not modify tests, generated files, build artifacts, or unrelated configuration files unless they are directly required by the task

finish according to the configured mini-swe-agent completion protocol. The agent should not modify tests, generated files, build artifacts, or unrelated configuration files unless they are directly required by the task. G.3 Base Mini-SWE-agent Prompt Solver T ask Prompt T emplate Fix the issue described below. <ISSUE> {{ problem_statement }} </ISSUE> <TAS...

[62] [62]

Scope conversion: the Solver localizes list_to_scope or scope_to_list but uses generic filtering, sorting, or normalization that violates OAuthLib's helper contracts

[63] [63]

Constructor storage: endpoint and client constructors contain same-typed parameters, causing the Solver to swap fields, transform values, or store parameters under the wrong private attributes

[64] [64]

OAuth1 plumbing: nonce, timestamp, realm, and callback_uri are all 20 string-like but semantically different, so type information alone is insufficient

[65] [65]

The Solver may replace inherited behavior instead of extending it, or forward attributes to the wrong object

OIDC inheritance: OpenID Connect grants inherit from or delegate to OAuth2 grant logic. The Solver may replace inherited behavior instead of extending it, or forward attributes to the wrong object. Representative traces include failures in scope utility tests, device endpoint tests, OAuth1 client/signature tests, and OpenID Connect grant-type tests. ## 3....