pith. machine review for the scientific record. sign in

SLiC-HF : Sequence likelihood calibration with human feedback

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

years

2026 5 2024 2

representative citing papers

Mind the Gap: Structure-Aware Consistency in Preference Learning

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Standard DPO surrogates are inconsistent for equicontinuous neural nets; SA-DPO provides structure-aware H-consistency bounds by adapting margins to semantic distance and shows heavy-tailed losses yield superior guarantees for capacity-bounded models via the Margin-Capacity Profile.

KTO: Model Alignment as Prospect Theoretic Optimization

cs.LG · 2024-02-02 · conditional · novelty 7.0

KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.

Self-Rewarding Language Models

cs.CL · 2024-01-18 · conditional · novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

Anomaly-Preference Image Generation

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

Anomaly Preference Optimization reformulates anomalous image synthesis as preference learning with implicit alignment from real anomalies and a time-aware capacity allocation module for diffusion models to balance diversity and fidelity.

citing papers explorer

Showing 7 of 7 citing papers.

  • Mind the Gap: Structure-Aware Consistency in Preference Learning cs.LG · 2026-04-30 · unverdicted · none · ref 63

    Standard DPO surrogates are inconsistent for equicontinuous neural nets; SA-DPO provides structure-aware H-consistency bounds by adapting margins to semantic distance and shows heavy-tailed losses yield superior guarantees for capacity-bounded models via the Margin-Capacity Profile.

  • KTO: Model Alignment as Prospect Theoretic Optimization cs.LG · 2024-02-02 · conditional · none · ref 24

    KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.

  • Self-Rewarding Language Models cs.CL · 2024-01-18 · conditional · none · ref 124

    Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

  • CROP: Expert-Aligned Image Cropping via Compositional Reasoning and Optimizing Preference cs.CV · 2026-05-09 · unverdicted · none · ref 12

    CROP uses compositional reasoning and expert preference alignment in VLMs to produce aesthetic crops that match human experts more closely than previous methods.

  • Data-dependent Exploration for Online Reinforcement Learning from Human Feedback cs.LG · 2026-05-06 · unverdicted · none · ref 94

    DEPO uses historical data to build a data-dependent uncertainty bonus for exploration in online RLHF, yielding an adaptive regret bound and stronger empirical performance than baselines.

  • Anomaly-Preference Image Generation cs.CV · 2026-05-04 · unverdicted · none · ref 24

    Anomaly Preference Optimization reformulates anomalous image synthesis as preference learning with implicit alignment from real anomalies and a time-aware capacity allocation module for diffusion models to balance diversity and fidelity.

  • Representation-Guided Parameter-Efficient LLM Unlearning cs.CL · 2026-04-19 · unverdicted · none · ref 209

    REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.