Machine Learning , volume=

Simple statistical gradient-following algorithms for connectionist reinforcement learning , author= · 1992

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm

eess.SY · 2026-04-23 · unverdicted · novelty 6.0

A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

cs.LG · 2024-02-22 · conditional · novelty 6.0

REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.

citing papers explorer

Showing 2 of 2 citing papers.

A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm eess.SY · 2026-04-23 · unverdicted · none · ref 88
A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs cs.LG · 2024-02-22 · conditional · none · ref 65
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.

Machine Learning , volume=

fields

years

verdicts

representative citing papers

citing papers explorer