SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Omni-MATH supplies 4428 human-verified Olympiad math problems that expose top LLMs achieving only 52.55% to 60.54% accuracy on the most difficult items.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
Fine-tuning CodeBERT, GraphCodeBERT, UniXcoder and CodeT5+ with augmentation, cross-validation and ensembling yields macro-F1 of 0.737 on binary human-vs-AI code detection and 0.422 on 11-class model attribution in SemEval-2026 Task 13.
citing papers explorer
-
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.
-
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Omni-MATH supplies 4428 human-verified Olympiad math problems that expose top LLMs achieving only 52.55% to 60.54% accuracy on the most difficult items.
-
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
-
Fine-Tuning Pre-Trained Code Models for AI-Generated Code Detection
Fine-tuning CodeBERT, GraphCodeBERT, UniXcoder and CodeT5+ with augmentation, cross-validation and ensembling yields macro-F1 of 0.737 on binary human-vs-AI code detection and 0.422 on 11-class model attribution in SemEval-2026 Task 13.