Smith, Yejin Choi, and Hannaneh Hajishirzi

The Art of Saying No: Contextual Noncompliance in Language Models · 2024 · arXiv 2407.12043

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Implicit Humanization in Everyday LLM Moral Judgments

cs.CY · 2026-03-23 · unverdicted · novelty 7.0

LLM responses to moral judgment queries reinforce implicit humanization, potentially exacerbating overreliance and misplaced trust.

Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.

Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

Language models refuse 75.4% of requests to evade defeated rules and do so even after recognizing reasons that undermine the rule's legitimacy.

citing papers explorer

Showing 3 of 3 citing papers.

Implicit Humanization in Everyday LLM Moral Judgments cs.CY · 2026-03-23 · unverdicted · none · ref 6
LLM responses to moral judgment queries reinforce implicit humanization, potentially exacerbating overreliance and misplaced trust.
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts cs.LG · 2026-04-20 · unverdicted · none · ref 3
BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules cs.AI · 2026-04-03 · unverdicted · none · ref 5
Language models refuse 75.4% of requests to evade defeated rules and do so even after recognizing reasons that undermine the rule's legitimacy.

Smith, Yejin Choi, and Hannaneh Hajishirzi

fields

years

verdicts

representative citing papers

citing papers explorer