pith. sign in

arxiv: 2301.13105 · v3 · pith:5XLGU33Hnew · submitted 2023-01-30 · 💻 cs.LG · stat.ML

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

classification 💻 cs.LG stat.ML
keywords generalizationdatafunctionsgotulearningmodelsreasoningunseen
0
0 comments X
read the original abstract

This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for sparse functions and a class of network models including instances of Transformers, random features models, and linear networks, a min-degree-interpolator is learned on the unseen. More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements. These findings lead to two implications: (1) we provide an explanation to the length generalization problem for Boolean functions (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports. Finally, we discuss extensions to other models or non-sparse regimes where the min-degree bias may still occur or fade, as well as how it can be potentially corrected when undesirable.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Stabilizing Extrapolation in Looped Transformers via Learned Stochastic Stopping

    cs.LG 2026-06 unverdicted novelty 6.0

    Stochastic loop counts during training of looped transformers reduce OOD variance on binary addition, Dyck-1, Unique Set and Copy tasks, with learned RL-Halting further improving the accuracy-stability trade-off.