On First-Order Meta-Learning Algorithms

Alex Nichol, John Schulman, Joshua Achiam

Authors on Pith no claims yet

classification 💻 cs.LG

keywords algorithmsfirst-ordermeta-learningtaskderivativesdistributionfamilyincludes

read the original abstract

This paper considers meta-learning problems, where there is a distribution of tasks, and we would like to obtain an agent that performs well (i.e., learns quickly) when presented with a previously unseen task sampled from this distribution. We analyze a family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates. This family includes and generalizes first-order MAML, an approximation to MAML obtained by ignoring second-order derivatives. It also includes Reptile, a new algorithm that we introduce here, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task. We expand on the results from Finn et al. showing that first-order meta-learning algorithms perform well on some well-established benchmarks for few-shot classification, and we provide theoretical analysis aimed at understanding why these algorithms work.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Meta-learning-enhanced implicit full waveform inversion
physics.geo-ph 2026-04 unverdicted novelty 7.0

Meta-IFWI pretrains a SIREN implicit neural network via meta-learning across velocity models to achieve faster convergence, higher accuracy, and better generalization than standard implicit full waveform inversion.
FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning
cs.CL 2026-05 unverdicted novelty 6.0

FocuSFT uses an inner optimization loop to adapt fast-weight parameters into a parametric memory that sharpens attention on relevant content, then conditions outer-loop supervised fine-tuning on this representation, y...
HARMONY: Bridging the Personalization-Generalization Gap by Mitigating Representation Skew in Heterogeneous Split Federated Learning
cs.LG 2026-05 unverdicted novelty 6.0

HARMONY mitigates representation skew in heterogeneous hybrid split federated learning via meta-learning to simulate diverse extractors and server-side contrastive learning to align features, delivering up to 43% accu...
RFPrompt: Prompt-Based Expert Adaptation of the Large Wireless Model for Modulation Classification
cs.LG 2026-05 unverdicted novelty 6.0

RFPrompt adapts the Large Wireless Model via deep prompt tokens to improve out-of-distribution robustness in modulation classification while training only a small number of parameters.
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
cs.LG 2026-04 unverdicted novelty 6.0

BinomMAML uses a binomial expansion to estimate meta-gradients more accurately than prior approximations, with error bounds that improve on existing methods and decay super-exponentially under mild conditions.
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
cs.LG 2026-04 unverdicted novelty 6.0

A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.
Titans: Learning to Memorize at Test Time
cs.LG 2024-12 unverdicted novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates
cs.LG 2026-04 unverdicted novelty 5.0

LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
Where to Bind Matters: Hebbian Fast Weights in Vision Transformers for Few-Shot Character Recognition
cs.NE 2026-04 unverdicted novelty 4.0

Placing one Hebbian fast-weight module after the final stage of Swin-Tiny achieves 96.2% accuracy on 5-way 1-shot Omniglot classification, outperforming the non-Hebbian baseline by 0.3 points.