A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.
Machine Learning , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.
citing papers explorer
-
A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm
A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.
-
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.