super hub

Language Models are Few-Shot Learners

Benjamin Mann, Jared Kaplan, Melanie Subbiah, Nick Ryder, Prafulla Dhariwal, Tom B. Brown · 2020 · cs.CL · arXiv 2005.14165

146 Pith papers cite this work. Polarity classification is still indexing.

146 Pith papers citing it

open full Pith review browse 146 citing papers more from Benjamin Mann arXiv PDF

abstract

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

unclear 1

claims ledger

abstract Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performan

authors

Benjamin Mann Jared Kaplan Melanie Subbiah Nick Ryder Prafulla Dhariwal Tom B. Brown

co-cited works

representative citing papers

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

Generative Agents: Interactive Simulacra of Human Behavior

cs.HC · 2023-04-07 · accept · novelty 8.0

Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Show Your Work: Scratchpads for Intermediate Computation with Language Models

cs.LG · 2021-11-30 · unverdicted · novelty 8.0

Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

cs.CL · 2020-12-31 · conditional · novelty 8.0

The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.

Measuring Massive Multitask Language Understanding

cs.CY · 2020-09-07 · accept · novelty 8.0

Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.

The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Topology-enhanced alignment via persistent homology on trajectories outperforms standard SFT and DPO baselines on preference metrics for LLMs.

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

cs.CL · 2026-05-05 · accept · novelty 7.0

Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.

Reconstructing conformal field theoretical compositions with Transformers

hep-th · 2026-05-01 · unverdicted · novelty 7.0

Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.

E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems

cs.CR · 2026-05-01 · unverdicted · novelty 7.0

E-MIA converts document details into four types of exam questions and aggregates the RAG's answers into a membership score that separates member and non-member documents better than prior similarity-based or probe-based attacks.

From Static Analysis to Audience Dissemination: A Training-Free Multimodal Controversy Detection Multi-Agent Framework

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

AuDisAgent reformulates multimodal controversy detection as a dynamic audience dissemination process using screening, panel discussion, and arbitration agents, plus comment bootstrapping, and reports outperforming prior static methods on a public dataset.

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

Agentic Witnessing enables privacy-preserving auditing of semantic properties in private data by running an LLM auditor in a TEE that answers binary queries and produces cryptographic transcripts of its reasoning.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Evaluating Temporal Consistency in Multi-Turn Language Models

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.

Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.

On the Emergence of Syntax by Means of Local Interaction

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

cs.CR · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design

q-bio.QM · 2026-04-18 · unverdicted · novelty 7.0

ProtoCycle improves text-guided protein design by coupling an LLM planner with tool feedback and reflection to achieve better language alignment and foldability than direct generation.

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.

Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

cs.AI · 2026-04-14 · unverdicted · novelty 7.0

GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.

LiveGesture Streamable Co-Speech Gesture Generation Model

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

LiveGesture introduces the first fully streamable zero-lookahead co-speech full-body gesture generation model using a causal vector-quantized tokenizer and hierarchical autoregressive transformers that matches offline SOTA on BEAT2.

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

citing papers explorer

Showing 50 of 146 citing papers.

Large Language Diffusion Models cs.CL · 2025-02-14 · unverdicted · none · ref 4 · internal anchor
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Generative Agents: Interactive Simulacra of Human Behavior cs.HC · 2023-04-07 · accept · none · ref 20 · internal anchor
Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.
Editing Models with Task Arithmetic cs.LG · 2022-12-08 · accept · none · ref 9 · internal anchor
Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.
Show Your Work: Scratchpads for Intermediate Computation with Language Models cs.LG · 2021-11-30 · unverdicted · none · ref 4 · internal anchor
Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.
The Pile: An 800GB Dataset of Diverse Text for Language Modeling cs.CL · 2020-12-31 · conditional · none · ref 93 · internal anchor
The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.
Measuring Massive Multitask Language Understanding cs.CY · 2020-09-07 · accept · none · ref 67 · internal anchor
Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning cs.LG · 2026-05-09 · unverdicted · none · ref 37 · internal anchor
The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits math.OC · 2026-05-08 · unverdicted · none · ref 58 · internal anchor
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization cs.CL · 2026-05-08 · unverdicted · none · ref 7 · internal anchor
Topology-enhanced alignment via persistent homology on trajectories outperforms standard SFT and DPO baselines on preference metrics for LLMs.
Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages cs.CL · 2026-05-05 · accept · none · ref 32 · internal anchor
Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.
Reconstructing conformal field theoretical compositions with Transformers hep-th · 2026-05-01 · unverdicted · none · ref 37 · internal anchor
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems cs.CR · 2026-05-01 · unverdicted · none · ref 2 · internal anchor
E-MIA converts document details into four types of exam questions and aggregates the RAG's answers into a membership score that separates member and non-member documents better than prior similarity-based or probe-based attacks.
From Static Analysis to Audience Dissemination: A Training-Free Multimodal Controversy Detection Multi-Agent Framework cs.LG · 2026-05-01 · unverdicted · none · ref 1 · internal anchor
AuDisAgent reformulates multimodal controversy detection as a dynamic audience dissemination process using screening, panel discussion, and arbitration agents, plus comment bootstrapping, and reports outperforming prior static methods on a public dataset.
Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing cs.CR · 2026-04-27 · unverdicted · none · ref 3 · internal anchor
Agentic Witnessing enables privacy-preserving auditing of semantic properties in private data by running an LLM auditor in a TEE that answers binary queries and produces cryptographic transcripts of its reasoning.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 21 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Evaluating Temporal Consistency in Multi-Turn Language Models cs.CL · 2026-04-24 · unverdicted · none · ref 6 · internal anchor
Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.
Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations cs.CL · 2026-04-20 · unverdicted · none · ref 6 · internal anchor
KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.
On the Emergence of Syntax by Means of Local Interaction cs.CL · 2026-04-20 · unverdicted · none · ref 1 · internal anchor
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading cs.CR · 2026-04-19 · unverdicted · none · ref 119 · 2 links · internal anchor
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design q-bio.QM · 2026-04-18 · unverdicted · none · ref 2 · internal anchor
ProtoCycle improves text-guided protein design by coupling an LLM planner with tool feedback and reflection to achieve better language alignment and foldability than direct generation.
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration cs.CL · 2026-04-17 · unverdicted · none · ref 1 · internal anchor
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport cs.AI · 2026-04-14 · unverdicted · none · ref 5 · internal anchor
GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.
LiveGesture Streamable Co-Speech Gesture Generation Model cs.CV · 2026-04-13 · unverdicted · none · ref 1 · internal anchor
LiveGesture introduces the first fully streamable zero-lookahead co-speech full-body gesture generation model using a causal vector-quantized tokenizer and hierarchical autoregressive transformers that matches offline SOTA on BEAT2.
Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings q-bio.QM · 2026-04-09 · unverdicted · none · ref 2 · internal anchor
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
Sell More, Play Less: Benchmarking LLM Realistic Selling Skill cs.CL · 2026-04-08 · conditional · none · ref 4 · internal anchor
SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods cs.DC · 2026-04-02 · unverdicted · none · ref 22 · internal anchor
Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 124 · internal anchor
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023-07-17 · conditional · none · ref 4 · internal anchor
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning cs.CV · 2023-05-11 · conditional · none · ref 5 · internal anchor
Instruction tuning of BLIP-2 with an instruction-aware Query Transformer delivers state-of-the-art zero-shot performance on held-out vision-language datasets and strong finetuned results on downstream tasks.
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting cs.CV · 2023-01-02 · accept · none · ref 3 · internal anchor
Argoverse 2 introduces three new datasets with annotated sensor data, massive lidar collections, and challenging motion forecasting scenarios for autonomous driving research.
LAION-5B: An open large-scale dataset for training next generation image-text models cs.CV · 2022-10-16 · accept · none · ref 9 · internal anchor
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
In-context Learning and Induction Heads cs.LG · 2022-09-24 · unverdicted · none · ref 1 · internal anchor
Induction heads, which implement pattern completion in attention, develop at the same training stage as a sudden rise in in-context learning, providing evidence they are the primary mechanism for in-context learning in transformers.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 121 · internal anchor
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
Hierarchical Text-Conditional Image Generation with CLIP Latents cs.CV · 2022-04-13 · accept · none · ref 4 · internal anchor
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs cs.CV · 2021-11-03 · unverdicted · none · ref 9 · internal anchor
LAION-400M is a publicly released open dataset of 400 million CLIP-filtered image-text pairs with embeddings and kNN indices for efficient search.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 7 · internal anchor
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 6 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021-01-11 · accept · none · ref 3 · internal anchor
Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.
Prefix-Tuning: Optimizing Continuous Prompts for Generation cs.CL · 2021-01-01 · conditional · none · ref 3 · internal anchor
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
Scaling Laws for Autoregressive Generative Modeling cs.LG · 2020-10-28 · accept · none · ref 1 · internal anchor
Autoregressive transformers follow power-law scaling laws for cross-entropy loss with nearly universal exponents relating optimal model size to compute budget across four domains.
DeBERTa: Decoding-enhanced BERT with Disentangled Attention cs.CL · 2020-06-05 · unverdicted · none · ref 3 · internal anchor
DeBERTa improves BERT-style models by separating content and relative position in attention and adding absolute positions to the decoder, yielding consistent gains on NLU and NLG tasks and the first single-model superhuman score on SuperGLUE.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 241 · internal anchor
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 7 · internal anchor
Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces cs.DC · 2026-05-11 · unverdicted · none · ref 62 · internal anchor
Chakra introduces a portable, interoperable graph-based execution trace format for distributed ML workloads along with supporting tools to standardize performance benchmarking and software-hardware co-design.
Spectral Transformer Neural Processes cs.LG · 2026-05-10 · unverdicted · none · ref 8 · internal anchor
STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.
LLM-Agnostic Semantic Representation Attack cs.CL · 2026-05-09 · unverdicted · none · ref 1 · internal anchor
SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.
XPERT: Expert Knowledge Transfer for Effective Training of Language Models cs.CL · 2026-05-09 · unverdicted · none · ref 7 · internal anchor
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
Ensemble Distributionally Robust Bayesian Optimisation cs.LG · 2026-05-08 · unverdicted · none · ref 95 · internal anchor
A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.
Coupling Models for One-Step Discrete Generation cs.LG · 2026-05-08 · unverdicted · none · ref 15 · internal anchor
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
Query-efficient model evaluation using cached responses cs.LG · 2026-05-08 · unverdicted · none · ref 63 · internal anchor
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.

Language Models are Few-Shot Learners

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer