hub

Overthink: Slowdown attacks on reasoning llms

· 2025 · arXiv 2502.02542

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

cs.CR · 2026-06-12 · unverdicted · novelty 7.0

Attackers can force LLM guardrails into extended reasoning loops via optimized payloads, causing 13-63x token amplification and up to 148x latency in agent systems.

Overthink-Triggered Slowdown Attacks on LVLM-Based Robotic Systems

cs.CR · 2026-07-01 · unverdicted · novelty 6.0

Adversaries can use crafted scene text to trigger overthinking in LVLM-based robots, producing transferable slowdowns up to 6.96x latency amplification.

RecurGuard: Runtime Monitoring for Reasoning-Token Consumption Attacks

cs.CR · 2026-06-06 · unverdicted · novelty 6.0

RecurGuard monitors recurrence rate, volume growth, and query progress in exposed reasoning traces to terminate generation on token-consumption attacks, reporting 99% detection on OverThink and 92% on ExtendAttack with near-zero false positives.

Inference Cost Attacks for Retrieval-Augmented Large Language Models

cs.CR · 2026-05-31 · unverdicted · novelty 6.0

Poisoning external knowledge bases with LLM-agent-crafted documents can increase RAG inference token consumption by up to 13.12 times at over 90% success rate while preserving answer quality.

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

AdaptR1 uses fully RL-based training with a quality-gated efficiency reward for step-wise adaptive reasoning in multi-hop QA, reducing think tokens by 69.71% on average and 90.35% on HotpotQA with comparable or better performance.

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving

cs.CR · 2026-05-27 · unverdicted · novelty 6.0

ReasonBreak demonstrates up to 89% attack success on reasoning and 72% on trajectories in NVIDIA Alpamayo VLA models via black-box textual perturbations, introducing a reasoning-aware evaluation framework and benchmark for autonomous driving.

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

An attention-guided RL reward combined with diverse persuasion strategies produces higher attack success rates against large reasoning models than prior jailbreak methods.

OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents

cs.LG · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

OTora is a two-stage framework that generates insertion-aware adversarial triggers and ICL-guided genetic payloads to induce reasoning-level denial-of-service in tool-augmented LLM agents across multiple backbones while preserving task correctness.

Conflicts Make Large Reasoning Models Vulnerable to Attacks

cs.CR · 2026-04-10 · conditional · novelty 6.0

Conflicts between alignment objectives or dilemmas increase attack success rates on LRMs by shifting and overlapping safety and functional neural representations.

Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

cs.AI · 2026-03-26 · unverdicted · novelty 6.0

An external zero-shot monitor detects nine unsafe reasoning behaviors in LLMs at 87% step-level accuracy with low false positives and low latency.

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

cs.RO · 2026-03-26 · unverdicted · novelty 6.0

SABER uses a trained ReAct agent to produce bounded adversarial edits to robot instructions, cutting task success by 20.6% and increasing execution length and violations on the LIBERO benchmark across six VLA models.

Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

cs.CL · 2025-10-04 · unverdicted · novelty 6.0

Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

cs.AI · 2026-03-18 · unverdicted · novelty 5.0

CRAFT uses contrastive representation learning and RL on hidden states to align reasoning models for improved safety against jailbreaks, reporting 79% and 87.7% gains over base models.

Reinforcement Learning for Scalable and Trustworthy Intelligent Systems

cs.LG · 2026-05-08 · unverdicted · novelty 3.0

Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

citing papers explorer

Showing 16 of 16 citing papers.

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails cs.CR · 2026-06-12 · unverdicted · none · ref 33
Attackers can force LLM guardrails into extended reasoning loops via optimized payloads, causing 13-63x token amplification and up to 148x latency in agent systems.
Overthink-Triggered Slowdown Attacks on LVLM-Based Robotic Systems cs.CR · 2026-07-01 · unverdicted · none · ref 5
Adversaries can use crafted scene text to trigger overthinking in LVLM-based robots, producing transferable slowdowns up to 6.96x latency amplification.
RecurGuard: Runtime Monitoring for Reasoning-Token Consumption Attacks cs.CR · 2026-06-06 · unverdicted · none · ref 5
RecurGuard monitors recurrence rate, volume growth, and query progress in exposed reasoning traces to terminate generation on token-consumption attacks, reporting 99% detection on OverThink and 92% on ExtendAttack with near-zero false positives.
Inference Cost Attacks for Retrieval-Augmented Large Language Models cs.CR · 2026-05-31 · unverdicted · none · ref 16
Poisoning external knowledge bases with LLM-agent-crafted documents can increase RAG inference token consumption by up to 13.12 times at over 90% success rate while preserving answer quality.
AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering cs.CL · 2026-05-29 · unverdicted · none · ref 2
AdaptR1 uses fully RL-based training with a quality-gated efficiency reward for step-wise adaptive reasoning in multi-hop QA, reducing think tokens by 69.71% on average and 90.35% on HotpotQA with comparable or better performance.
ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving cs.CR · 2026-05-27 · unverdicted · none · ref 23
ReasonBreak demonstrates up to 89% attack success on reasoning and 72% on trajectories in NVIDIA Alpamayo VLA models via black-box textual perturbations, introducing a reasoning-aware evaluation framework and benchmark for autonomous driving.
CLORE: Content-Level Optimization for Reasoning Efficiency cs.AI · 2026-05-21 · unverdicted · none · ref 25
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models cs.AI · 2026-05-19 · unverdicted · none · ref 14
An attention-guided RL reward combined with diverse persuasion strategies produces higher attack success rates against large reasoning models than prior jailbreak methods.
OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents cs.LG · 2026-05-09 · unverdicted · none · ref 4 · 2 links
OTora is a two-stage framework that generates insertion-aware adversarial triggers and ICL-guided genetic payloads to induce reasoning-level denial-of-service in tool-augmented LLM agents across multiple backbones while preserving task correctness.
Conflicts Make Large Reasoning Models Vulnerable to Attacks cs.CR · 2026-04-10 · conditional · none · ref 3
Conflicts between alignment objectives or dilemmas increase attack success rates on LRMs by shifting and overlapping safety and functional neural representations.
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models cs.AI · 2026-03-26 · unverdicted · none · ref 13
An external zero-shot monitor detects nine unsafe reasoning behaviors in LLMs at 87% step-level accuracy with low false positives and low latency.
SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models cs.RO · 2026-03-26 · unverdicted · none · ref 21
SABER uses a trained ReAct agent to produce bounded adversarial edits to robot instructions, cutting task success by 20.6% and increasing execution length and violations on the LIBERO benchmark across six VLA models.
Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models cs.CL · 2025-10-04 · unverdicted · none · ref 21
Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.
Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations cs.AI · 2026-03-18 · unverdicted · none · ref 16
CRAFT uses contrastive representation learning and RL on hidden states to align reasoning models for improved safety against jailbreaks, reporting 79% and 87.7% gains over base models.
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems cs.LG · 2026-05-08 · unverdicted · none · ref 203
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 87
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

Overthink: Slowdown attacks on reasoning llms

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer