Codedpo: Aligning code models with self generated and verified source code.arXiv preprint arXiv:2410.05605, 2024a

Kechi Zhang, Ge Li, Yihong Dong, Jingjing Xu, Jun Zhang, Jing Su, Yongfei Liu, Zhi Jin · 2024 · arXiv 2410.05605

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SkelDPO: A Skeleton-Guided Direct Preference Optimization Framework for Efficient Code Generation

cs.SE · 2026-06-05 · unverdicted · novelty 7.0

SkelDPO improves code generation efficiency by 2-7% over prior DPO methods via joint preference losses on full code and efficiency-critical skeletons.

Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.

Chiseling Out Efficiency: Structured Skeleton Supervision for Efficient Code Generation

cs.SE · 2026-06-05 · unverdicted · novelty 6.0

EffiSkel improves LLM-generated code efficiency by supervising on extracted structural efficiency skeletons via multi-task learning of code generation and skeleton prediction.

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

cs.CR · 2026-06-02 · unverdicted · novelty 6.0

TSP reframes secure code generation as a tree-structured self-play process that supplies dense on-policy signals at vulnerability-prone nodes, yielding higher security pass rates and cross-language generalization than SFT or unstructured self-play.

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

cs.CV · 2026-02-07 · unverdicted · novelty 6.0

Fine-R1 uses chain-of-thought supervised fine-tuning on a structured FGVR reasoning dataset plus triplet augmented policy optimization to outperform general MLLMs and CLIP models on seen and unseen fine-grained categories with 4-shot training.

Visual-RFT: Visual Reinforcement Fine-Tuning

cs.CV · 2025-03-03 · conditional · novelty 6.0

Visual-RFT applies reinforcement learning with verifiable perception rewards to improve large vision-language models on fine-grained classification, few-shot detection, and grounding tasks.

An Iterative Test-and-Repair Framework for Competitive Code Generation

cs.SE · 2026-04-07

citing papers explorer

Showing 6 of 6 citing papers after filters.

SkelDPO: A Skeleton-Guided Direct Preference Optimization Framework for Efficient Code Generation cs.SE · 2026-06-05 · unverdicted · none · ref 54
SkelDPO improves code generation efficiency by 2-7% over prior DPO methods via joint preference losses on full code and efficiency-critical skeletons.
Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization cs.LG · 2026-05-12 · unverdicted · none · ref 39
DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
Chiseling Out Efficiency: Structured Skeleton Supervision for Efficient Code Generation cs.SE · 2026-06-05 · unverdicted · none · ref 57
EffiSkel improves LLM-generated code efficiency by supervising on extracted structural efficiency skeletons via multi-task learning of code generation and skeleton prediction.
Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs cs.CR · 2026-06-02 · unverdicted · none · ref 35
TSP reframes secure code generation as a tree-structured self-play process that supplies dense on-policy signals at vulnerability-prone nodes, yielding higher security pass rates and cross-language generalization than SFT or unstructured self-play.
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning cs.CV · 2026-02-07 · unverdicted · none · ref 33
Fine-R1 uses chain-of-thought supervised fine-tuning on a structured FGVR reasoning dataset plus triplet augmented policy optimization to outperform general MLLMs and CLIP models on seen and unseen fine-grained categories with 4-shot training.
An Iterative Test-and-Repair Framework for Competitive Code Generation cs.SE · 2026-04-07 · unreviewed · ref 60

Codedpo: Aligning code models with self generated and verified source code.arXiv preprint arXiv:2410.05605, 2024a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer