SkelDPO improves code generation efficiency by 2-7% over prior DPO methods via joint preference losses on full code and efficiency-critical skeletons.
Codedpo: Aligning code models with self generated and verified source code.arXiv preprint arXiv:2410.05605, 2024a
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
EffiSkel improves LLM-generated code efficiency by supervising on extracted structural efficiency skeletons via multi-task learning of code generation and skeleton prediction.
TSP reframes secure code generation as a tree-structured self-play process that supplies dense on-policy signals at vulnerability-prone nodes, yielding higher security pass rates and cross-language generalization than SFT or unstructured self-play.
Fine-R1 uses chain-of-thought supervised fine-tuning on a structured FGVR reasoning dataset plus triplet augmented policy optimization to outperform general MLLMs and CLIP models on seen and unseen fine-grained categories with 4-shot training.
Visual-RFT applies reinforcement learning with verifiable perception rewards to improve large vision-language models on fine-grained classification, few-shot detection, and grounding tasks.
citing papers explorer
-
SkelDPO: A Skeleton-Guided Direct Preference Optimization Framework for Efficient Code Generation
SkelDPO improves code generation efficiency by 2-7% over prior DPO methods via joint preference losses on full code and efficiency-critical skeletons.
-
Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization
DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
-
Chiseling Out Efficiency: Structured Skeleton Supervision for Efficient Code Generation
EffiSkel improves LLM-generated code efficiency by supervising on extracted structural efficiency skeletons via multi-task learning of code generation and skeleton prediction.
-
Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs
TSP reframes secure code generation as a tree-structured self-play process that supplies dense on-policy signals at vulnerability-prone nodes, yielding higher security pass rates and cross-language generalization than SFT or unstructured self-play.
-
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Fine-R1 uses chain-of-thought supervised fine-tuning on a structured FGVR reasoning dataset plus triplet augmented policy optimization to outperform general MLLMs and CLIP models on seen and unseen fine-grained categories with 4-shot training.
- An Iterative Test-and-Repair Framework for Competitive Code Generation