Diffu- sion fine-tuning via reparameterized policy gradient of the soft q-function.arXiv preprint arXiv:2512.04559, 2025

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function · 2025 · arXiv 2512.04559

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

FAV aligns few-step generative models by amortizing SVGD updates from reward-tilted sampling into generator parameters via fixed-point regression, requiring only sample access, and shows outperformance on robotics tasks plus scaling on image generators.

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference cs.LG · 2026-05-26 · unverdicted · none · ref 41
FAV aligns few-step generative models by amortizing SVGD updates from reward-tilted sampling into generator parameters via fixed-point regression, requiring only sample access, and shows outperformance on robotics tasks plus scaling on image generators.

Diffu- sion fine-tuning via reparameterized policy gradient of the soft q-function.arXiv preprint arXiv:2512.04559, 2025

fields

years

verdicts

representative citing papers

citing papers explorer