pith. sign in

super hub Mixed citations

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Mixed citation behavior. Most common role is background (65%).

258 Pith papers citing it
Background 65% of classified citations
abstract

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

hub tools

citation-role summary

background 50 method 13 baseline 5

citation-polarity summary

claims ledger

  • abstract We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. Th

authors

co-cited works

representative citing papers

Slot Machines: How LLMs Keep Track of Multiple Entities

cs.CL · 2026-04-22 · unverdicted · novelty 8.0

LLM activations encode current and prior entities in orthogonal slots, but models only use the current slot for explicit factual retrieval despite prior-slot information being linearly decodable.

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

GIANTS: Generative Insight Anticipation from Scientific Literature

cs.CL · 2026-04-10 · unverdicted · novelty 8.0

GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

PAL: Program-aided Language Models

cs.CL · 2022-11-18 · conditional · novelty 8.0

PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.

The Regularizing Power of Language-Training Deepfake Detectors

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.

Test-Time Hinting for Black-Box Vision-Language Models

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Test-Time Hinting trains a hint generator to prepend contextual guidance to VLM prompts, improving accuracy on natural-image VQA benchmarks with generalization to unseen tasks and models.

citing papers explorer

Showing 50 of 258 citing papers.