Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
How many data points is a prompt worth?
6 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 6representative citing papers
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
citing papers explorer
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
LIMO: Less is More for Reasoning
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
-
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
-
A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.