Canonical reference

Rank analysis of incomplete block designs: I. the method of paired comparisons,

· 1952 · arXiv stable/2334029

Canonical reference. 71% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 71% of classified citations

read on arXiv browse 17 citing papers

citation-role summary

background 5 method 2

citation-polarity summary

background 5 use method 2

representative citing papers

Pretraining Exposure Explains Popularity Judgments in Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MASS-DPO derives a Plackett-Luce-specific log-determinant Fisher information objective to select non-redundant negative samples, matching or exceeding multi-negative DPO performance with substantially fewer negatives across four benchmarks and three model families.

Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

Agent Island is a new multiagent game environment that functions as a dynamic benchmark resistant to saturation and contamination, with Bayesian ranking showing OpenAI GPT-5.5 as the strongest performer among 49 models across 999 games.

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

STOMP extends direct preference optimization to the multi-objective setting via smooth Tchebysheff scalarization and standardization of observed rewards, achieving highest hypervolume in eight of nine protein engineering evaluations.

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

cs.CV · 2026-02-11 · unverdicted · novelty 7.0

DiNa-LRM introduces a diffusion-native latent reward model using a noise-calibrated Thurstone likelihood on noisy states, matching VLM performance at lower compute in image alignment and preference optimization.

Bayesian Preference Learning for Test-Time Steerable Reward Models

cs.LG · 2026-02-09 · unverdicted · novelty 7.0

ICRM casts reward modeling as amortized variational inference over a latent preference probability with a Beta prior, enabling test-time adaptation to unseen preferences and improving benchmark performance.

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.

Best Policy Learning from Trajectory Preference Feedback

cs.LG · 2025-01-31 · unverdicted · novelty 6.0

PSPL maintains posteriors over reward models and dynamics to deliver the first Bayesian simple regret guarantees for PbRL and outperforms baselines on simulation and image generation tasks.

Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

cs.LG · 2024-10-25 · unverdicted · novelty 6.0

Diversity-regularized DPO fine-tuning of ProteinMPNN improves structural similarity scores by at least 8% over base model and sequence diversity by up to 20% over standard DPO for peptide inverse folding on OpenFold structures.

SPLC: Social Preference Learning for Crowd Robot Navigation

cs.RO · 2026-07-02 · unverdicted · novelty 5.0

SPLC uses social preference feedback to auto-generate preference data for offline RL, improving socially compliant crowd robot navigation over baselines.

Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.

Failure Modes of Maximum Entropy RLHF

cs.LG · 2025-09-24 · unverdicted · novelty 5.0

Derives SimPO from MaxEnt RL and reports that MaxEnt RL in online RLHF exhibits frequent overoptimization and unstable KL dynamics across scales, unlike stable KL-constrained baselines.

Hallucination of Multimodal Large Language Models: A Survey

cs.CV · 2024-04-29 · accept · novelty 5.0

The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

cs.CL · 2023-11-09 · unverdicted · novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

cs.CL · 2026-04-23

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16

citing papers explorer

Showing 2 of 2 citing papers after filters.

Pretraining Exposure Explains Popularity Judgments in Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 3
LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.
Reinforcement Learning from Human Feedback cs.LG · 2025-04-16 · unreviewed · ref 72

Rank analysis of incomplete block designs: I. the method of paired comparisons,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer