LeAP: Learnable Adaptive Permutation for Feature Selection in Heterogeneous and Sparse Recommender Systems
Pith reviewed 2026-06-28 17:35 UTC · model grok-4.3
The pith
LeAP transforms random permutation into a learnable mechanism with adaptive regularization to rank features in heterogeneous and sparse recommender systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By turning the permutation-based feature importance evaluation into a learnable module and applying adaptive regularization suited to heterogeneous dimensions and extreme sparsity, LeAP provides a model-agnostic way to select features that outperforms prior approaches on public datasets and removes over 3,600 redundant dimensions from a production model with more than 12,000 features without any drop in performance.
What carries the argument
Learnable adaptive permutation, which replaces random shuffling with a trainable evaluation of feature contributions, supported by regularization that adapts to different feature dimensionalities and sparsity levels.
If this is right
- State-of-the-art results on four public recommendation datasets.
- Successful deployment in a large-scale industrial model handling over a billion daily requests.
- Ability to remove 2 to 10 times more redundant dimensions than compared baseline methods.
- No performance degradation after removing the selected redundant features.
Where Pith is reading between the lines
- The method may extend to other machine learning tasks involving high-dimensional sparse heterogeneous inputs, such as in natural language or graph models.
- It suggests that learnable versions of traditional importance measures could replace many hand-tuned feature selection pipelines in industry.
- Further scaling might show how the adaptive strategy behaves when sparsity patterns change across different user segments.
Load-bearing premise
The transformation of random permutation into a learnable mechanism with adaptive regularization enables more accurate feature importance ranking in spaces with asymmetric dimensions and extreme sparsity.
What would settle it
If removing the features flagged as redundant by LeAP in the industrial model causes a measurable drop in ranking quality or prediction accuracy, while keeping all features or using fewer removals from other methods does not.
Figures
read the original abstract
Modern industrial recommender systems rely on thousands of heterogeneous features -- ranging from low-dimensional scalars (e.g., statistical value) to high-dimensional embeddings (e.g., user-id embeddings, MLP representations) -- to achieve high-precision predictions. Given the immense computational costs associated with training, efficient feature selection is critical. However, existing methods encounter three primary bottlenecks: (1) they typically assume uniform feature dimensions or require costly mapping to a fixed size; (2) they struggle with extreme sparsity, where the majority of features (e.g., 99%+) remain at default values; and (3) traditional permutation-based approaches are computationally prohibitive in large-scale settings. To address these challenges, we propose LeAP (Learnable Adaptive Permutation), a novel, model-agnostic plug-in module for feature selection. LeAP transforms the inefficient random permutation process into a learnable mechanism, significantly accelerating the evaluation of feature importance. In addition, we introduce an adaptive regularization strategy tailored for heterogeneous dimensions and extreme sparsity, enabling superior feature importance ranking results across asymmetric input spaces. Experiments on four public recommendation datasets demonstrate that LeAP achieves state-of-the-art performance. Furthermore, LeAP has been deployed in a large-scale industrial search ranking model with over a billion daily requests and a 2TB model parameter scale. In this real-world scenario involving 12,000+ total feature dimensions, LeAP successfully identified and removed over 3,600 redundant dimensions without performance degradation, which is 2 to 10 times the ability of compared baseline methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces LeAP, a model-agnostic plug-in module that converts the random permutation process into a learnable mechanism augmented by an adaptive regularization strategy. It targets feature selection challenges in recommender systems with heterogeneous feature dimensions (scalars to embeddings) and extreme sparsity (99%+ defaults). The work claims state-of-the-art results on four public recommendation datasets and reports a production deployment in a search ranking model serving over a billion daily requests with a 2 TB parameter scale, where LeAP removed more than 3,600 redundant dimensions from over 12,000 total features without performance degradation—2–10 times the removal capacity of compared baselines.
Significance. If the empirical claims hold, the contribution is significant for large-scale industrial recommender systems, where feature selection directly affects training cost and latency under heterogeneity and sparsity. The explicit industrial deployment at billion-request scale with concrete removal counts provides practical validation beyond academic benchmarks, strengthening the case for model-agnostic plug-in approaches in production environments.
minor comments (2)
- The abstract states SOTA performance on four public datasets but does not name the datasets, metrics, or exact baselines; adding these specifics in the abstract or early results section would improve immediate readability.
- The industrial result (removal of >3600 dimensions from 12k+) is presented without reference to the specific model architecture, evaluation metric used to confirm 'no performance degradation,' or statistical significance testing; a brief methods or results subsection citation would strengthen the claim.
Simulated Author's Rebuttal
We thank the referee for the positive summary, recognition of the industrial deployment results, and recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity identified
full rationale
The paper introduces LeAP as a novel model-agnostic module that converts random permutation into a learnable process plus adaptive regularization for heterogeneous sparse features. No derivation chain, equations, or self-citations are shown that reduce any claimed prediction or result to fitted inputs or prior author work by construction. Central claims rest on empirical results from public datasets and a production deployment removing >3600 dimensions, which are independent of any internal definitional loop.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Knowledge and information systems34, 483– 519 (2013)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowledge and information systems34, 483– 519 (2013)
2013
-
[2]
Machine learning45, 5–32 (2001)
Breiman, L.: Random forests. Machine learning45, 5–32 (2001)
2001
-
[3]
In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785–794 (2016)
2016
-
[4]
InProceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS 2016)
Cheng, H.T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., Shah, H.: Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Rec- ommender Systems. p. 7–10. DLRS 2016, Association for Computing Ma- ...
-
[5]
In: NeurIPS 2023 Second Table Representation Learning Workshop (2023)
Cherepanova, V., Levin, R., Somepalli, G., Geiping, J., Bruss, C., Wilson, A., Gold- stein, T., Goldblum, M.: A performance-driven benchmark for feature selection in tabular deep learning. In: NeurIPS 2023 Second Table Representation Learning Workshop (2023)
2023
-
[6]
In: Proceedings of the 10th ACM conference on recommender systems
Covington, P., Adams, J., Sargin, E.: Deep neural networks for youtube recommen- dations. In: Proceedings of the 10th ACM conference on recommender systems. pp. 191–198 (2016)
2016
-
[7]
In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
Du, C., Gao, Z., Yuan, S., Gao, L., Li, Z., Zeng, Y., Zhu, X., Xu, J., Gai, K., Lee, K.C.: Exploration in online advertising systems with deep uncertainty-aware learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 2792–2801 (2021)
2021
-
[8]
Journal of machine learning research: JMLR20(2019)
Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of machine learning research: JMLR20(2019)
2019
-
[9]
ACM Transactions on Management Information Sys- tems (TMIS)6(4), 1–19 (2015)
Gomez-Uribe, C.A., Hunt, N.: The netflix recommender system: Algorithms, busi- ness value, and innovation. ACM Transactions on Management Information Sys- tems (TMIS)6(4), 1–19 (2015)
2015
-
[10]
In: Proceedings of the 29th ACM international conference on information & knowledge management
Gu, Y., Ding, Z., Wang, S., Zou, L., Liu, Y., Yin, D.: Deep multifaceted transform- ers for multi-objective ranking in large-scale e-commerce recommender systems. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp. 2493–2500 (2020)
2020
-
[11]
arXiv preprint arXiv:2206.00267 (2022) 16 Y
Guo, Y., Liu, Z., Tan, J., Liao, C., Chang, D., Liu, Q., Yang, S., Liu, J., Kong, D., Chen, Z., et al.: Lpfs: Learnable polarizing feature selection for click-through rate prediction. arXiv preprint arXiv:2206.00267 (2022) 16 Y. Huang et al
-
[12]
In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Jia, P., Wang, Y., Du, Z., Zhao, X., Wang, Y., Chen, B., Wang, W., Guo, H., Tang, R.: Erase: Benchmarking feature selection methods for deep recommender systems. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp. 5194–5205 (2024)
2024
-
[13]
Advances in neural information processing systems30, 3146–3154 (2017)
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems30, 3146–3154 (2017)
2017
-
[14]
Electronics11(1), 141 (2022)
Ko, H., Lee, S., Park, Y., Choi, A.: A survey of recommendation systems: rec- ommendation models, techniques, and application fields. Electronics11(1), 141 (2022)
2022
-
[15]
In: Proceedings of the ACM Web Conference 2023
Lyu, F., Tang, X., Liu, D., Chen, L., He, X., Liu, X.: Optimizing feature set for click-through rate prediction. In: Proceedings of the ACM Web Conference 2023. pp. 3386–3395 (2023)
2023
-
[16]
IEEE transactions on neural networks and learning systems29(10), 4967– 4982 (2018)
Shi, Y., Miao, J., Wang, Z., Zhang, P., Niu, L.: Feature selection with regulariza- tion. IEEE transactions on neural networks and learning systems29(10), 4967– 4982 (2018)
2018
- [17]
-
[18]
Journal of the Royal Statistical Society: Series B (Methodological) (1996)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996)
1996
-
[19]
In: Proceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval
Wang, Y., Du, Z., Zhao, X., Chen, B., Guo, H., Tang, R., Dong, Z.: Single-shot feature selection for multi-task recommendations. In: Proceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 341–351 (2023)
2023
-
[20]
In: Proceedings of the ACM Web Conference (2022)
Wang, Y., Zhao, X., Xu, T., Wu, X.: Autofield: Automating feature selection in deep recommender systems. In: Proceedings of the ACM Web Conference (2022)
2022
-
[21]
IEEE Transactions on Knowledge and Data Engineering36(9) (2024)
Yao, Y., Liu, B., He, H., Sheng, D., Wang, K., Xiao, L., Cao, H.: I- razor: A differentiable neural input razor for feature selection and dimension search in dnn-based recommender systems. IEEE Transactions on Knowledge and Data Engineering36(9) (2024). https://doi.org/10.1109/tkde.2023.3332671, http://dx.doi.org/10.1109/TKDE.2023.3332671
-
[22]
In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Yin, D., Hu, Y., Tang, J., Daly, T., Zhou, M., Ouyang, H., Chen, J., Kang, C., Deng, H., Nobata, C., et al.: Ranking relevance in yahoo search. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 323–332 (2016)
2016
-
[23]
In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
Zhang, B., Sun, C., Tan, J., Cai, X., Zhao, J., Miao, M., Yin, K., Song, C., Mou, N., Song, Y.: Shark: A lightweight model compression approach for large-scale recommender systems. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. CIKM ’23 (2023)
2023
-
[24]
In: Proceedings of the 2024 SIAM International Conference on Data Mining (SDM)
Zhao, M., Jiang, L., Yu, Y., Wang, X., Yuan, Y., Wei, Z., Niu, D.: Dimreg: Embed- ding dimension search via regularization for recommender systems. In: Proceedings of the 2024 SIAM International Conference on Data Mining (SDM). pp. 562–570. SIAM (2024)
2024
-
[25]
In: Proceedings of the Web Conference 2021 (2021)
Zhao, X., Liu, H., Liu, H., Tang, J., Guo, W., Shi, J., Wang, S., Gao, H., Long, B.: Autodim: Field-aware embedding dimension searchin recommender systems. In: Proceedings of the Web Conference 2021 (2021)
2021
-
[26]
ACM Transactions on Information Systems41(4), 1–38 (2023)
Zheng, R., Qu, L., Cui, B., Shi, Y., Yin, H.: Automl for deep recommender systems: A survey. ACM Transactions on Information Systems41(4), 1–38 (2023)
2023
-
[27]
In: Proceedings of Learnable Adaptive Permutation 17 the 24th ACM SIGKDD international conference on knowledge discovery & data mining
Zhou, G., Zhu, X., Song, C., Fan, Y., Zhu, H., Ma, X., Yan, Y., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. In: Proceedings of Learnable Adaptive Permutation 17 the 24th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 1059–1068 (2018)
2018
-
[28]
In: Proceedings of the 34th ACM International Conference on In- formation and Knowledge Management
Zhu, J., Fan, Z., Zhu, X., Jiang, Y., Wang, H., Han, X., Ding, H., Wang, X., Zhao, W., Gong, Z., et al.: Rankmixer: Scaling up ranking models in industrial recommenders. In: Proceedings of the 34th ACM International Conference on In- formation and Knowledge Management. pp. 6309–6316 (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.