Sequential Attention for Feature Selection

Gang Fu; Lin Chen; Matthew Fahrbach; MohammadHossein Bateni; Taisuke Yasuda; Vahab Mirrokni

arxiv: 2209.14881 · v3 · pith:4FVA2KWMnew · submitted 2022-09-29 · 💻 cs.LG · stat.ML

Sequential Attention for Feature Selection

Taisuke Yasuda , MohammadHossein Bateni , Lin Chen , Matthew Fahrbach , Gang Fu , Vahab Mirrokni This is my paper

classification 💻 cs.LG stat.ML

keywords featureattentionselectionalgorithmfeaturesempiricalmodelnetworks

0 comments

read the original abstract

Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AutoNFS: Automatic Neural Feature Selection
cs.LG 2025-03 unverdicted novelty 6.0

AutoNFS is an end-to-end differentiable feature selection method that uses Gumbel-Sigmoid sampling to automatically determine the minimal feature set for classification and regression tasks.
Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks
stat.ML 2026-06 unverdicted novelty 4.0

Introduces three knockoff filters for FDR-controlled variable screening in regularized DNNs and reports satisfactory empirical performance versus existing algorithms.