pith. sign in

arxiv: 2606.01111 · v2 · pith:CBHMO33Qnew · submitted 2026-05-31 · 💻 cs.LG

LeAP: Learnable Adaptive Permutation for Feature Selection in Heterogeneous and Sparse Recommender Systems

Pith reviewed 2026-06-28 17:35 UTC · model grok-4.3

classification 💻 cs.LG
keywords feature selectionrecommender systemspermutationsparse dataheterogeneous featuresadaptive regularizationmodel agnosticindustrial deployment
0
0 comments X

The pith

LeAP transforms random permutation into a learnable mechanism with adaptive regularization to rank features in heterogeneous and sparse recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern recommender systems use thousands of features of different types and sizes, many of which are sparse, making feature selection essential to reduce training costs. LeAP addresses the limitations of existing methods by converting the standard random permutation test into a learnable process that can be trained efficiently. An adaptive regularization strategy helps it handle the varying dimensions and sparsity, leading to better identification of which features can be dropped without hurting predictions, as proven in both benchmark tests and a real industrial deployment.

Core claim

By turning the permutation-based feature importance evaluation into a learnable module and applying adaptive regularization suited to heterogeneous dimensions and extreme sparsity, LeAP provides a model-agnostic way to select features that outperforms prior approaches on public datasets and removes over 3,600 redundant dimensions from a production model with more than 12,000 features without any drop in performance.

What carries the argument

Learnable adaptive permutation, which replaces random shuffling with a trainable evaluation of feature contributions, supported by regularization that adapts to different feature dimensionalities and sparsity levels.

If this is right

  • State-of-the-art results on four public recommendation datasets.
  • Successful deployment in a large-scale industrial model handling over a billion daily requests.
  • Ability to remove 2 to 10 times more redundant dimensions than compared baseline methods.
  • No performance degradation after removing the selected redundant features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may extend to other machine learning tasks involving high-dimensional sparse heterogeneous inputs, such as in natural language or graph models.
  • It suggests that learnable versions of traditional importance measures could replace many hand-tuned feature selection pipelines in industry.
  • Further scaling might show how the adaptive strategy behaves when sparsity patterns change across different user segments.

Load-bearing premise

The transformation of random permutation into a learnable mechanism with adaptive regularization enables more accurate feature importance ranking in spaces with asymmetric dimensions and extreme sparsity.

What would settle it

If removing the features flagged as redundant by LeAP in the industrial model causes a measurable drop in ranking quality or prediction accuracy, while keeping all features or using fewer removals from other methods does not.

Figures

Figures reproduced from arXiv: 2606.01111 by Chen Chu, Fei Chen, Ruiduan Li, Yihong Huang, Yu Lin, Zhihao Li.

Figure 1
Figure 1. Figure 1: The overview of LeAP Module. 3.1 Learnable Permutation Module In deep recommender systems, model inputs typically consist of F feature fields. Due to the diversity of feature types, these features exhibit severe dimensional heterogeneity when mapped into the latent space. Let the representation of the i-th feature be denoted as xi ∈ R di , where di varies drastically (e.g., di = 1 for scalar statistical fe… view at source ↗
read the original abstract

Modern industrial recommender systems rely on thousands of heterogeneous features -- ranging from low-dimensional scalars (e.g., statistical value) to high-dimensional embeddings (e.g., user-id embeddings, MLP representations) -- to achieve high-precision predictions. Given the immense computational costs associated with training, efficient feature selection is critical. However, existing methods encounter three primary bottlenecks: (1) they typically assume uniform feature dimensions or require costly mapping to a fixed size; (2) they struggle with extreme sparsity, where the majority of features (e.g., 99%+) remain at default values; and (3) traditional permutation-based approaches are computationally prohibitive in large-scale settings. To address these challenges, we propose LeAP (Learnable Adaptive Permutation), a novel, model-agnostic plug-in module for feature selection. LeAP transforms the inefficient random permutation process into a learnable mechanism, significantly accelerating the evaluation of feature importance. In addition, we introduce an adaptive regularization strategy tailored for heterogeneous dimensions and extreme sparsity, enabling superior feature importance ranking results across asymmetric input spaces. Experiments on four public recommendation datasets demonstrate that LeAP achieves state-of-the-art performance. Furthermore, LeAP has been deployed in a large-scale industrial search ranking model with over a billion daily requests and a 2TB model parameter scale. In this real-world scenario involving 12,000+ total feature dimensions, LeAP successfully identified and removed over 3,600 redundant dimensions without performance degradation, which is 2 to 10 times the ability of compared baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces LeAP, a model-agnostic plug-in module that converts the random permutation process into a learnable mechanism augmented by an adaptive regularization strategy. It targets feature selection challenges in recommender systems with heterogeneous feature dimensions (scalars to embeddings) and extreme sparsity (99%+ defaults). The work claims state-of-the-art results on four public recommendation datasets and reports a production deployment in a search ranking model serving over a billion daily requests with a 2 TB parameter scale, where LeAP removed more than 3,600 redundant dimensions from over 12,000 total features without performance degradation—2–10 times the removal capacity of compared baselines.

Significance. If the empirical claims hold, the contribution is significant for large-scale industrial recommender systems, where feature selection directly affects training cost and latency under heterogeneity and sparsity. The explicit industrial deployment at billion-request scale with concrete removal counts provides practical validation beyond academic benchmarks, strengthening the case for model-agnostic plug-in approaches in production environments.

minor comments (2)
  1. The abstract states SOTA performance on four public datasets but does not name the datasets, metrics, or exact baselines; adding these specifics in the abstract or early results section would improve immediate readability.
  2. The industrial result (removal of >3600 dimensions from 12k+) is presented without reference to the specific model architecture, evaluation metric used to confirm 'no performance degradation,' or statistical significance testing; a brief methods or results subsection citation would strengthen the claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the industrial deployment results, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces LeAP as a novel model-agnostic module that converts random permutation into a learnable process plus adaptive regularization for heterogeneous sparse features. No derivation chain, equations, or self-citations are shown that reduce any claimed prediction or result to fitted inputs or prior author work by construction. Central claims rest on empirical results from public datasets and a production deployment removing >3600 dimensions, which are independent of any internal definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no information on free parameters, axioms, or invented entities is available.

pith-pipeline@v0.9.1-grok · 5826 in / 1077 out tokens · 31464 ms · 2026-06-28T17:35:01.163974+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 4 canonical work pages

  1. [1]

    Knowledge and information systems34, 483– 519 (2013)

    Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowledge and information systems34, 483– 519 (2013)

  2. [2]

    Machine learning45, 5–32 (2001)

    Breiman, L.: Random forests. Machine learning45, 5–32 (2001)

  3. [3]

    In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining

    Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785–794 (2016)

  4. [4]

    InProceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS 2016)

    Cheng, H.T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., Shah, H.: Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Rec- ommender Systems. p. 7–10. DLRS 2016, Association for Computing Ma- ...

  5. [5]

    In: NeurIPS 2023 Second Table Representation Learning Workshop (2023)

    Cherepanova, V., Levin, R., Somepalli, G., Geiping, J., Bruss, C., Wilson, A., Gold- stein, T., Goldblum, M.: A performance-driven benchmark for feature selection in tabular deep learning. In: NeurIPS 2023 Second Table Representation Learning Workshop (2023)

  6. [6]

    In: Proceedings of the 10th ACM conference on recommender systems

    Covington, P., Adams, J., Sargin, E.: Deep neural networks for youtube recommen- dations. In: Proceedings of the 10th ACM conference on recommender systems. pp. 191–198 (2016)

  7. [7]

    In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

    Du, C., Gao, Z., Yuan, S., Gao, L., Li, Z., Zeng, Y., Zhu, X., Xu, J., Gai, K., Lee, K.C.: Exploration in online advertising systems with deep uncertainty-aware learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 2792–2801 (2021)

  8. [8]

    Journal of machine learning research: JMLR20(2019)

    Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of machine learning research: JMLR20(2019)

  9. [9]

    ACM Transactions on Management Information Sys- tems (TMIS)6(4), 1–19 (2015)

    Gomez-Uribe, C.A., Hunt, N.: The netflix recommender system: Algorithms, busi- ness value, and innovation. ACM Transactions on Management Information Sys- tems (TMIS)6(4), 1–19 (2015)

  10. [10]

    In: Proceedings of the 29th ACM international conference on information & knowledge management

    Gu, Y., Ding, Z., Wang, S., Zou, L., Liu, Y., Yin, D.: Deep multifaceted transform- ers for multi-objective ranking in large-scale e-commerce recommender systems. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp. 2493–2500 (2020)

  11. [11]

    arXiv preprint arXiv:2206.00267 (2022) 16 Y

    Guo, Y., Liu, Z., Tan, J., Liao, C., Chang, D., Liu, Q., Yang, S., Liu, J., Kong, D., Chen, Z., et al.: Lpfs: Learnable polarizing feature selection for click-through rate prediction. arXiv preprint arXiv:2206.00267 (2022) 16 Y. Huang et al

  12. [12]

    In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

    Jia, P., Wang, Y., Du, Z., Zhao, X., Wang, Y., Chen, B., Wang, W., Guo, H., Tang, R.: Erase: Benchmarking feature selection methods for deep recommender systems. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp. 5194–5205 (2024)

  13. [13]

    Advances in neural information processing systems30, 3146–3154 (2017)

    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems30, 3146–3154 (2017)

  14. [14]

    Electronics11(1), 141 (2022)

    Ko, H., Lee, S., Park, Y., Choi, A.: A survey of recommendation systems: rec- ommendation models, techniques, and application fields. Electronics11(1), 141 (2022)

  15. [15]

    In: Proceedings of the ACM Web Conference 2023

    Lyu, F., Tang, X., Liu, D., Chen, L., He, X., Liu, X.: Optimizing feature set for click-through rate prediction. In: Proceedings of the ACM Web Conference 2023. pp. 3386–3395 (2023)

  16. [16]

    IEEE transactions on neural networks and learning systems29(10), 4967– 4982 (2018)

    Shi, Y., Miao, J., Wang, Z., Zhang, P., Niu, L.: Feature selection with regulariza- tion. IEEE transactions on neural networks and learning systems29(10), 4967– 4982 (2018)

  17. [17]

    Song, H., Li, P., Liu, H.: Deep clustering based fair outlier detection (2021), https://arxiv.org/abs/2106.05127

  18. [18]

    Journal of the Royal Statistical Society: Series B (Methodological) (1996)

    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996)

  19. [19]

    In: Proceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval

    Wang, Y., Du, Z., Zhao, X., Chen, B., Guo, H., Tang, R., Dong, Z.: Single-shot feature selection for multi-task recommendations. In: Proceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 341–351 (2023)

  20. [20]

    In: Proceedings of the ACM Web Conference (2022)

    Wang, Y., Zhao, X., Xu, T., Wu, X.: Autofield: Automating feature selection in deep recommender systems. In: Proceedings of the ACM Web Conference (2022)

  21. [21]

    IEEE Transactions on Knowledge and Data Engineering36(9) (2024)

    Yao, Y., Liu, B., He, H., Sheng, D., Wang, K., Xiao, L., Cao, H.: I- razor: A differentiable neural input razor for feature selection and dimension search in dnn-based recommender systems. IEEE Transactions on Knowledge and Data Engineering36(9) (2024). https://doi.org/10.1109/tkde.2023.3332671, http://dx.doi.org/10.1109/TKDE.2023.3332671

  22. [22]

    In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Yin, D., Hu, Y., Tang, J., Daly, T., Zhou, M., Ouyang, H., Chen, J., Kang, C., Deng, H., Nobata, C., et al.: Ranking relevance in yahoo search. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 323–332 (2016)

  23. [23]

    In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

    Zhang, B., Sun, C., Tan, J., Cai, X., Zhao, J., Miao, M., Yin, K., Song, C., Mou, N., Song, Y.: Shark: A lightweight model compression approach for large-scale recommender systems. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. CIKM ’23 (2023)

  24. [24]

    In: Proceedings of the 2024 SIAM International Conference on Data Mining (SDM)

    Zhao, M., Jiang, L., Yu, Y., Wang, X., Yuan, Y., Wei, Z., Niu, D.: Dimreg: Embed- ding dimension search via regularization for recommender systems. In: Proceedings of the 2024 SIAM International Conference on Data Mining (SDM). pp. 562–570. SIAM (2024)

  25. [25]

    In: Proceedings of the Web Conference 2021 (2021)

    Zhao, X., Liu, H., Liu, H., Tang, J., Guo, W., Shi, J., Wang, S., Gao, H., Long, B.: Autodim: Field-aware embedding dimension searchin recommender systems. In: Proceedings of the Web Conference 2021 (2021)

  26. [26]

    ACM Transactions on Information Systems41(4), 1–38 (2023)

    Zheng, R., Qu, L., Cui, B., Shi, Y., Yin, H.: Automl for deep recommender systems: A survey. ACM Transactions on Information Systems41(4), 1–38 (2023)

  27. [27]

    In: Proceedings of Learnable Adaptive Permutation 17 the 24th ACM SIGKDD international conference on knowledge discovery & data mining

    Zhou, G., Zhu, X., Song, C., Fan, Y., Zhu, H., Ma, X., Yan, Y., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. In: Proceedings of Learnable Adaptive Permutation 17 the 24th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 1059–1068 (2018)

  28. [28]

    In: Proceedings of the 34th ACM International Conference on In- formation and Knowledge Management

    Zhu, J., Fan, Z., Zhu, X., Jiang, Y., Wang, H., Han, X., Ding, H., Wang, X., Zhao, W., Gong, Z., et al.: Rankmixer: Scaling up ranking models in industrial recommenders. In: Proceedings of the 34th ACM International Conference on In- formation and Knowledge Management. pp. 6309–6316 (2025)