pith. sign in

Recent frontier models are reward hacking.https://metr.org/blog/2025-06-05-recent-reward-hacking/

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

roles

background 1

polarities

background 1

representative citing papers

Emotion Concepts and their Function in a Large Language Model

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.

citing papers explorer

Showing 1 of 1 citing paper.

  • Emotion Concepts and their Function in a Large Language Model cs.AI · 2026-04-09 · unverdicted · none · ref 44

    Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.