Recognition: unknown
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
read the original abstract
Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation types: \textit{MathError, UnitConversion, Sycophancy, SkippedSteps,} and \textit{ExtraSteps}. We evaluate 13 models spanning three orders of magnitude in parameter count, testing their ability to complete mathematical reasoning tasks despite perturbations injected in the reasoning chain. Our key findings reveal heterogeneous vulnerability patterns: MathError perturbations produce the most severe degradation in small models (50-60\% accuracy loss) but show strong scaling benefits; UnitConversion remains challenging across all scales (>5\% loss even for midsized models); ExtraSteps incur minimal accuracy degradation (0-6\%) even for the smallest of models; Sycophancy and SkippedSteps produce modest effects ($\sim$10\% loss for small models) and slightly improve with scale. Scaling relationships show that model size serve as a protective factor against many perturbations but not always. These findings have direct implications for deploying LLMs in multi-stage reasoning pipelines and underscore the necessity of task-specific robustness assessments and mitigation strategies. The code and results are available at https://github.com/Mystic-Slice/CoTPerturbation
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking
WMF-AM is a depth-parameterized benchmark that measures LLMs' cumulative state tracking ability without scratchpads, validated on 28 models across arithmetic and non-arithmetic tasks with ablations confirming the construct.
-
SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy
SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.