{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:IIEAK72YB67YYZTRLIIH2KDEWJ","short_pith_number":"pith:IIEAK72Y","schema_version":"1.0","canonical_sha256":"4208057f580fbf8c66715a107d2864b2548c8bef57c8e7cc713325e3ddea90f6","source":{"kind":"arxiv","id":"2502.18449","version":2},"attestation_state":"computed","paper":{"title":"SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Reinforcement learning on open software evolution data enables LLMs to recover developer reasoning and solve 41% of real GitHub issues.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.SE","authors_text":"Daniel Fried, Gabriel Synnaeve, Jade Copet, Lingming Zhang, Olivier Duchenne, Quentin Carbonneaux, Rishabh Singh, Sida I. Wang, Yuxiang Wei","submitted_at":"2025-02-25T18:45:04Z","abstract_excerpt":"The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While DeepSeek-R1 and other follow-up work primarily focus on applying RL to competitive coding and math problems, this paper introduces SWE-RL, the first approach to scale RL-based LLM reasoning for real-world software engineering. Leveraging a lightweight rule-based reward (e.g., the similarity score between ground-truth and LLM-generated solutions), SWE-RL enables LLMs to autonomously recover a developer's reaso"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2502.18449","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.SE","submitted_at":"2025-02-25T18:45:04Z","cross_cats_sorted":["cs.AI","cs.CL"],"title_canon_sha256":"a91330707567ffa0ad37268e5f68258ff99c1316fd5dd035252f3e5e1a1cc008","abstract_canon_sha256":"730dfc0fa178a079f7ed7e2f80a86ff8ac9f804908893c4a1853b3b8bb5d84b9"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:52.781007Z","signature_b64":"Q38ZOVdIVSz2Ajg6Ki1/68gdl5HvoQriAWLSgKBR5op6Uw11PtFwANVSb0XfjUsQkQSzZXMhKO8sGPpaufPCBw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"4208057f580fbf8c66715a107d2864b2548c8bef57c8e7cc713325e3ddea90f6","last_reissued_at":"2026-05-17T23:38:52.780436Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:52.780436Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Reinforcement learning on open software evolution data enables LLMs to recover developer reasoning and solve 41% of real GitHub issues.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.SE","authors_text":"Daniel Fried, Gabriel Synnaeve, Jade Copet, Lingming Zhang, Olivier Duchenne, Quentin Carbonneaux, Rishabh Singh, Sida I. Wang, Yuxiang Wei","submitted_at":"2025-02-25T18:45:04Z","abstract_excerpt":"The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While DeepSeek-R1 and other follow-up work primarily focus on applying RL to competitive coding and math problems, this paper introduces SWE-RL, the first approach to scale RL-based LLM reasoning for real-world software engineering. Leveraging a lightweight rule-based reward (e.g., the similarity score between ground-truth and LLM-generated solutions), SWE-RL enables LLMs to autonomously recover a developer's reaso"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our resulting reasoning model, Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verified -- a human-verified collection of real-world GitHub issues. To our knowledge, this is the best performance reported for medium-sized (<100B) LLMs to date, even comparable to leading proprietary LLMs like GPT-4o.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that a lightweight rule-based similarity score between ground-truth and generated solutions serves as an effective reward for learning genuine reasoning processes rather than superficial pattern matching.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Reinforcement learning on open software evolution data enables LLMs to recover developer reasoning and solve 41% of real GitHub issues.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"fe6533c6d9e271d0e7521ce818a7e23e44e6ebcdf982e1eb69132308f951c1fa"},"source":{"id":"2502.18449","kind":"arxiv","version":2},"verdict":{"id":"983ad494-552f-47b5-9c12-ce1cc3d6d6fe","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T10:23:06.408182Z","strongest_claim":"our resulting reasoning model, Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verified -- a human-verified collection of real-world GitHub issues. To our knowledge, this is the best performance reported for medium-sized (<100B) LLMs to date, even comparable to leading proprietary LLMs like GPT-4o.","one_line_summary":"SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that a lightweight rule-based similarity score between ground-truth and generated solutions serves as an effective reward for learning genuine reasoning processes rather than superficial pattern matching.","pith_extraction_headline":"Reinforcement learning on open software evolution data enables LLMs to recover developer reasoning and solve 41% of real GitHub issues."},"references":{"count":192,"sample":[{"doi":"","year":2024,"title":"Claude 3.5 sonnet model card addendum","work_id":"9821ab87-1805-43e6-8f4f-9a06dc3c9f37","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet","work_id":"1e26d961-6bb1-4e30-b195-245b5a95cfb1","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Codet: Code generation with generated tests","work_id":"5034399f-3a73-4a3e-824b-0fe8fe4d82e7","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Mich","work_id":"f06d44fc-f5c4-4ab7-951d-3eba0cbf5e88","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Meta large language model compiler: Foundation models of compiler optimization, 2024","work_id":"e9ffd682-76e4-4ec3-9520-c34bf4936d2c","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":192,"snapshot_sha256":"baf37ceea2c212ab306189dc3981b7be94026341c1a43e50528c8a73afef1435","internal_anchors":15},"formal_canon":{"evidence_count":3,"snapshot_sha256":"5087d323b022ddc79eceb574c084178530a3aab19f5ee8cc213f8524576e8e00"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2502.18449","created_at":"2026-05-17T23:38:52.780540+00:00"},{"alias_kind":"arxiv_version","alias_value":"2502.18449v2","created_at":"2026-05-17T23:38:52.780540+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2502.18449","created_at":"2026-05-17T23:38:52.780540+00:00"},{"alias_kind":"pith_short_12","alias_value":"IIEAK72YB67Y","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"IIEAK72YB67YYZTR","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"IIEAK72Y","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":31,"internal_anchor_count":31,"sample":[{"citing_arxiv_id":"2504.02181","citing_title":"A Survey of Scaling in Large Language Model Reasoning","ref_index":218,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22642","citing_title":"Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2508.03018","citing_title":"Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15565","citing_title":"AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17792","citing_title":"HydroAgent: Closing the Gap Between Frontier LLMs and Human Experts in Hydrologic Model Calibration via Simulator-Grounded RL","ref_index":59,"is_internal_anchor":true},{"citing_arxiv_id":"2508.20697","citing_title":"Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning","ref_index":55,"is_internal_anchor":true},{"citing_arxiv_id":"2510.06499","citing_title":"Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2510.18270","citing_title":"Can Old Tests Do New Tricks for Resolving SWE Issues?","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2601.12538","citing_title":"Agentic Reasoning for Large Language Models","ref_index":236,"is_internal_anchor":true},{"citing_arxiv_id":"2512.18470","citing_title":"SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios","ref_index":55,"is_internal_anchor":true},{"citing_arxiv_id":"2602.07906","citing_title":"AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2511.20857","citing_title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","ref_index":217,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12913","citing_title":"Revisiting DAgger in the Era of LLM-Agents","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12925","citing_title":"AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09134","citing_title":"BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models","ref_index":115,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11922","citing_title":"StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08703","citing_title":"RewardHarness: Self-Evolving Agentic Post-Training","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09134","citing_title":"BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models","ref_index":103,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05460","citing_title":"Agentic Discovery of Exchange-Correlation Density Functionals","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01208","citing_title":"Faithful Mobile GUI Agents with Guided Advantage Estimator","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00433","citing_title":"Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13934","citing_title":"Towards Enabling An Artificial Self-Construction Software Life-cycle via Autopoietic Architectures","ref_index":63,"is_internal_anchor":true},{"citing_arxiv_id":"2505.10978","citing_title":"Group-in-Group Policy Optimization for LLM Agent Training","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09107","citing_title":"TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07341","citing_title":"ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories","ref_index":86,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ","json":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ.json","graph_json":"https://pith.science/api/pith-number/IIEAK72YB67YYZTRLIIH2KDEWJ/graph.json","events_json":"https://pith.science/api/pith-number/IIEAK72YB67YYZTRLIIH2KDEWJ/events.json","paper":"https://pith.science/paper/IIEAK72Y"},"agent_actions":{"view_html":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ","download_json":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ.json","view_paper":"https://pith.science/paper/IIEAK72Y","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2502.18449&json=true","fetch_graph":"https://pith.science/api/pith-number/IIEAK72YB67YYZTRLIIH2KDEWJ/graph.json","fetch_events":"https://pith.science/api/pith-number/IIEAK72YB67YYZTRLIIH2KDEWJ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ/action/storage_attestation","attest_author":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ/action/author_attestation","sign_citation":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ/action/citation_signature","submit_replication":"https://pith.science/pith/IIEAK72YB67YYZTRLIIH2KDEWJ/action/replication_record"}},"created_at":"2026-05-17T23:38:52.780540+00:00","updated_at":"2026-05-17T23:38:52.780540+00:00"}