Random Soft Prompts (RSPs) sampled from the embedding distribution improve Pass@N on reasoning benchmarks by increasing early-stage token diversity without any training.
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Corruption studies on CoT chains detect the position of explicit answer statements rather than computational steps, as evidenced by format ablations collapsing suffix sensitivity 19x and models following conflicting answers at high rates.
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
citing papers explorer
-
From Noise to Diversity: Random Embedding Injection in LLM Reasoning
Random Soft Prompts (RSPs) sampled from the embedding distribution improve Pass@N on reasoning benchmarks by increasing early-stage token diversity without any training.
-
The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies
Corruption studies on CoT chains detect the position of explicit answer statements rather than computational steps, as evidenced by format ablations collapsing suffix sensitivity 19x and models following conflicting answers at high rates.
-
Training Large Language Models to Reason in a Continuous Latent Space
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
-
Measuring Faithfulness in Chain-of-Thought Reasoning
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.
-
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
-
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.