FORCEBENCH shows model judges often violate expected ordering on evidence-calibrated vs force-raised claim pairs, with standard support prompting yielding 47.2% MVR and explicit warrant prompting reducing it to 24.5%.
Alleviating attention hacking in discriminative reward modeling through interaction distillation
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.
BoostAPR boosts automated program repair by training a sequence-level assessor and line-level credit allocator from execution outcomes, then applying them in PPO to reach 40.7% on SWE-bench Verified.
MOSAIC combines frozen-LLM semantic embeddings with hierarchical consistency objectives to report up to 3.4% AUC gains on knowledge-tracing benchmarks including a new MOOC dataset.
citing papers explorer
-
Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG
FORCEBENCH shows model judges often violate expected ordering on evidence-calibrated vs force-raised claim pairs, with standard support prompting yielding 47.2% MVR and explicit warrant prompting reducing it to 24.5%.
-
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
BoostAPR boosts automated program repair by training a sequence-level assessor and line-level credit allocator from execution outcomes, then applying them in PPO to reach 40.7% on SWE-bench Verified.