pith. sign in

Monte MacDiarmid

Identifiers

  • name variant Monte MacDiarmid 0.60 · backfill

Papers (5)

  1. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet cs.AI · 2026 · author #14
  2. Alignment faking in large language models cs.AI · 2024 · author #5
  3. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #2
  4. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #6
  5. Steering Language Models With Activation Engineering cs.CL · 2023 · author #7

Mentions

  • 2605.29358 #14 · arxiv_oai · confidence 0.70 Monte MacDiarmid
  • 2406.10162 #2 · arxiv_oai · confidence 0.70 Monte MacDiarmid

Frequent Coauthors