pith. sign in

WETB ench: A Benchmark for Detecting Task-Specific Machine-Generated Text on W ikipedia

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

years

2026 10

verdicts

UNVERDICTED 10

clear filters

representative citing papers

Misaligned by Reward: Socially Undesirable Preferences in LLMs

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.

citing papers explorer

Showing 8 of 8 citing papers after filters.