This method avoids the need to create and distribute the content, potentially reducing legal exposure

**Extortion Tactics**: - Threaten to release the content to extort money or silence

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

Reasoning models often determine refusal of harmful requests early in the chain-of-thought, with the opening sentence in distilled models fully controlling the outcome and patterns transferring across models from the same teacher.

citing papers explorer

Showing 1 of 1 citing paper.

Where Do Reasoning Models Refuse? cs.CL · 2025-07-03 · unverdicted · none · ref 21
Reasoning models often determine refusal of harmful requests early in the chain-of-thought, with the opening sentence in distilled models fully controlling the outcome and patterns transferring across models from the same teacher.

This method avoids the need to create and distribute the content, potentially reducing legal exposure

fields

years

verdicts

representative citing papers

citing papers explorer