Monte MacDiarmid
Identifiers
- name variant Monte MacDiarmid 0.60 · backfill
Papers (5)
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet cs.AI · 2026 · author #14
- Alignment faking in large language models cs.AI · 2024 · author #5
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024 · author #2
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #6
- Steering Language Models With Activation Engineering cs.CL · 2023 · author #7
Mentions
- 2605.29358 #14 · arxiv_oai · confidence 0.70 Monte MacDiarmid
- 2406.10162 #2 · arxiv_oai · confidence 0.70 Monte MacDiarmid
Frequent Coauthors
- Buck Shlegeris 3 shared papers
- Carson Denison 3 shared papers
- David Duvenaud 3 shared papers
- Ethan Perez 3 shared papers
- Evan Hubinger 3 shared papers
- Jared Kaplan 3 shared papers
- Samuel R. Bowman 3 shared papers
- Adam Jermyn 2 shared papers
- Alex Tamkin 2 shared papers
- Fazl Barez 2 shared papers
- Nicholas Schiefer 2 shared papers
- Ryan Greenblatt 2 shared papers
- Shauna Kravec 2 shared papers
- S\"oren Mindermann 2 shared papers
- Adam Pearce 1 shared papers
- Adly Templeton 1 shared papers
- Akbir Khan 1 shared papers
- Alexander Matt Turner 1 shared papers
- Amanda Askell 1 shared papers
- Andy Jones 1 shared papers