The art of saying no: Contex- tual noncompliance in language models

Brahman, F · 2024 · arXiv 2407.12043

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

cs.SE · 2026-05-20 · conditional · novelty 8.0

RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.

Implicit Humanization in Everyday LLM Moral Judgments

cs.CY · 2026-03-23 · unverdicted · novelty 7.0

LLM responses to moral judgment queries reinforce implicit humanization, potentially exacerbating overreliance and misplaced trust.

Enhancing LLM Metacognition via Cognitive Pairwise Training

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

CPT is introduced as a pairwise reasoning-trace comparison stage that improves the reasoning-metacognition trade-off over standard SFT+RL pipelines across model scales.

Quantifying and Mitigating Premature Closure in Frontier LLMs

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

Frontier LLMs exhibit premature closure by selecting answers at high rates on medical tasks where the correct choice was removed and on open-ended queries, with safety prompting reducing but not eliminating the behavior.

Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.

Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

Language models refuse 75.4% of requests to evade defeated rules and do so even after recognizing reasons that undermine the rule's legitimacy.

LLM-Safety Evaluations Lack Robustness

cs.CR · 2025-03-04 · unverdicted · novelty 4.0

LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Quantifying and Mitigating Premature Closure in Frontier LLMs cs.CL · 2026-05-14 · unverdicted · none · ref 24
Frontier LLMs exhibit premature closure by selecting answers at high rates on medical tasks where the correct choice was removed and on open-ended queries, with safety prompting reducing but not eliminating the behavior.

The art of saying no: Contex- tual noncompliance in language models

fields

years

verdicts

representative citing papers

citing papers explorer