pith. sign in

arxiv: 2605.21561 · v1 · pith:KSJRD246new · submitted 2026-05-20 · 💻 cs.LG

Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection

Pith reviewed 2026-05-22 09:53 UTC · model grok-4.3

classification 💻 cs.LG
keywords unsupervised feature selectionmultiobjective optimizationPCA reconstruction lossPareto frontsearch dynamicssilhouette scoresynthetic datasetfeature subset selection
0
0 comments X

The pith

A PCA reconstruction loss objective in multiobjective unsupervised feature selection produces compact subsets with test accuracy comparable to direct supervised optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how the choice of evaluation objective and subset-size regularisation in multiobjective unsupervised feature selection shapes both the search process and the quality of the resulting feature subsets. Using a synthetic dataset that labels features as informative, redundant, or irrelevant, the authors test six formulations that combine accuracy, silhouette score, or PCA reconstruction loss with either minimising or maximising subset size. Silhouette-based versions strongly favour trivial low-cardinality solutions that perform poorly on predictive tasks, while the PCA loss version yields compact subsets whose test accuracy matches subsets found by optimising supervised accuracy directly. Readers would care because unsupervised feature selection is often required when labels are absent, making reliable objective design a practical necessity rather than a detail.

Core claim

The formulation of the multiobjective problem, including the evaluation objective and the direction of subset-size regularisation, strongly affects search dynamics and the quality of the Pareto front. In particular, the PCA reconstruction loss objective produces compact subsets whose test accuracy is comparable to subsets obtained by directly optimising supervised accuracy, whereas silhouette-score formulations exhibit a strong bias toward trivial low-cardinality solutions that remain weak proxies for predictive performance.

What carries the argument

The six multiobjective formulations that pair one of three evaluation objectives (accuracy, silhouette score, PCA reconstruction loss) with either subset-size minimisation or maximisation, compared on a controlled synthetic dataset with known informative, redundant, and irrelevant features.

If this is right

  • Objective choice determines whether the Pareto front contains useful compact subsets or collapses to trivial solutions.
  • Silhouette score serves as a weak proxy for predictive performance in this setting.
  • PCA loss functions as a viable unsupervised surrogate that can match the accuracy of supervised feature selection.
  • Both initialisation strategy and the direction of size regularisation further shape the search trajectory and final front.
  • Effective multiobjective unsupervised feature selection requires deliberate design of the evaluation objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The synthetic benchmark could serve as a controlled testbed for evaluating new search algorithms in feature selection.
  • Objective-induced biases observed here may appear in other multiobjective combinatorial tasks in machine learning.
  • When labels are unavailable, trying a PCA-based objective offers a concrete starting point for unsupervised selection.
  • Hybrid objectives that blend PCA loss with other unsupervised metrics might reduce the bias seen in single-objective versions.

Load-bearing premise

Search dynamics and quality outcomes observed on the synthetic dataset with explicitly labeled feature types generalize to real high-dimensional data where such ground truth is unavailable.

What would settle it

Applying the PCA loss formulation to a real high-dimensional dataset and measuring test accuracy of the resulting subsets against those from supervised optimization would falsify the comparability claim if the unsupervised subsets perform substantially worse.

Figures

Figures reproduced from arXiv: 2605.21561 by Anna V. Kononova, Martijn R. Tannemaat, Mathieu Cherpitel, Thomas B\"ack.

Figure 1
Figure 1. Figure 1: Correlation heatmap of the synthetic dataset used in this study. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Search history, initial population location and found Pareto Front [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Search history, initial population location and found Pareto Front [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Analysis of Pareto-optimal solutions for [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Analysis of Pareto-optimal solutions for [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pareto fronts of feature subsets under different MOFS formula [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Unsupervised feature selection is commonly formulated as a multiobjective optimisation problem that jointly optimises subset quality and subset size. Yet the behaviour of this formulation depends critically on the choice of evaluation objective, the direction of subset-size regularisation, and the initialisation strategy. We study these factors in a controlled setting using a synthetic dataset with known informative, redundant, and irrelevant feature types. Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation. The results show that formulation strongly affects both search dynamics and the quality of the resulting Pareto front. Silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions and remain weak proxies for predictive performance. In contrast, the proposed PCA loss objective produces compact subsets with test accuracy comparable to subsets obtained by directly optimising supervised accuracy. Overall, the study shows that objective design is central to effective multiobjective unsupervised feature selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies how the choice of evaluation objective (accuracy, silhouette score, or PCA reconstruction loss) and the direction of subset-size regularization (min or max) influence search dynamics and Pareto-front quality in multiobjective unsupervised feature selection. Experiments are conducted on a synthetic dataset whose features are explicitly partitioned into informative, redundant, and irrelevant types; six formulations are compared, with downstream supervised test accuracy used to assess the utility of the selected subsets. The central empirical result is that PCA-loss formulations yield compact subsets whose test accuracy is comparable to subsets obtained by directly optimizing supervised accuracy, whereas silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions.

Significance. If the reported behavior holds beyond the synthetic setting, the work supplies concrete guidance on objective design for multiobjective unsupervised feature selection and demonstrates that PCA reconstruction loss can act as a label-free proxy that preserves predictive utility while promoting compactness. The controlled synthetic construction with known feature types is a methodological strength that permits direct measurement of bias and search dynamics.

major comments (2)
  1. [§4] §4 (Experimental Results): The claim that the PCA-loss objective produces subsets with test accuracy comparable to direct supervised optimization is demonstrated only on synthetic data whose informative/redundant/irrelevant partition is known a priori. Because both the reconstruction loss and the evolutionary search can exploit this explicit partition, the observed Pareto-front quality and compactness may not transfer to real high-dimensional data where the partition is latent and feature interactions are denser; the manuscript should either qualify the transfer claim or add at least one real-world dataset experiment.
  2. [§3.2] §3.2 (Objective Formulations): The paper does not report the number of independent runs, statistical tests for differences in Pareto-front quality, or sensitivity analysis to the evolutionary algorithm hyperparameters. Without these, it is difficult to assess whether the reported superiority of PCA loss over silhouette score is robust or could be an artifact of a single run or particular hyperparameter setting.
minor comments (2)
  1. [Abstract] The abstract states that silhouette-based formulations 'remain weak proxies for predictive performance' but does not quantify this weakness with a specific metric (e.g., average test accuracy gap or hypervolume difference) that could be compared across tables.
  2. [§2] Notation for the multiobjective formulation (e.g., the precise definition of the PCA reconstruction loss inside the evolutionary loop) should be introduced earlier and used consistently when describing the six formulations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the constructive comments, which help clarify the scope and robustness of our findings. We address each major comment below.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental Results): The claim that the PCA-loss objective produces subsets with test accuracy comparable to direct supervised optimization is demonstrated only on synthetic data whose informative/redundant/irrelevant partition is known a priori. Because both the reconstruction loss and the evolutionary search can exploit this explicit partition, the observed Pareto-front quality and compactness may not transfer to real high-dimensional data where the partition is latent and feature interactions are denser; the manuscript should either qualify the transfer claim or add at least one real-world dataset experiment.

    Authors: We thank the referee for this observation and note that the controlled synthetic construction with known feature types is explicitly identified as a methodological strength in the report. The study was deliberately designed to enable direct measurement of bias and search dynamics, which would not be possible with latent partitions in real data. We agree that transferability is not demonstrated and will revise the manuscript to qualify all claims about PCA-loss comparability by adding explicit statements that the results hold for the synthetic dataset with an a priori known informative/redundant/irrelevant partition. This qualification will be inserted in §4 and the abstract without adding new real-world experiments, as the current design prioritizes controlled analysis over broad empirical claims. revision: yes

  2. Referee: [§3.2] §3.2 (Objective Formulations): The paper does not report the number of independent runs, statistical tests for differences in Pareto-front quality, or sensitivity analysis to the evolutionary algorithm hyperparameters. Without these, it is difficult to assess whether the reported superiority of PCA loss over silhouette score is robust or could be an artifact of a single run or particular hyperparameter setting.

    Authors: We agree that greater transparency on experimental protocol would strengthen the presentation. The reported results were obtained from multiple independent runs of the evolutionary algorithm to mitigate stochasticity, although this detail was omitted from the original text. In the revision we will specify the number of runs, report mean and standard deviation of key metrics (e.g., Pareto-front hypervolume and test accuracy), and include statistical comparisons such as Wilcoxon rank-sum tests between formulations. We will also add a short sensitivity discussion in §3.2 that examines the effect of the main hyperparameters (population size, mutation rate, and number of generations) based on the values used throughout the study. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison on synthetic data

full rationale

The paper conducts an empirical study comparing six multiobjective formulations for unsupervised feature selection using accuracy, silhouette score, and PCA reconstruction loss on a synthetic dataset with known feature types. Results are measured directly against held-out supervised test accuracy and subset cardinality, with no mathematical derivations, predictions, or self-referential definitions that reduce to fitted inputs. No self-citations are invoked as load-bearing premises, and the central claims rest on observable search dynamics and Pareto fronts rather than any construction that equates outputs to inputs by definition. The study is self-contained against its external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the synthetic data generator faithfully reproduces the statistical properties that drive real feature-selection behavior; no free parameters or invented entities are introduced beyond standard multiobjective optimization components.

axioms (1)
  • domain assumption The synthetic dataset with known informative, redundant, and irrelevant feature types accurately reflects the feature interactions encountered in real high-dimensional data.
    Invoked to interpret the observed biases and quality differences as generalizable.

pith-pipeline@v0.9.0 · 5704 in / 1263 out tokens · 41159 ms · 2026-05-22T09:53:54.288284+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We introduce an unsupervised objective, PCA loss which to the best to our knowledge has not been applied in this context, and analyse its behaviour under subset-size regularisation... The objective function then tries to minimise the Mean Square Error (loss) between Z and ˆZ

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Jun Chin Ang, Andri Mirzal, Habibollah Haron, and Haza Nuzly Ab- dull Hamed. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(5):971–989, September 2016. 19

  2. [2]

    pymoo: Multi-objective Optimiza- tion in Python.IEEE Access, 8:89497–89509, 2020

    Julian Blank and Kalyanmoy Deb. pymoo: Multi-objective Optimiza- tion in Python.IEEE Access, 8:89497–89509, 2020. arXiv:2002.04504 [cs]

  3. [3]

    Thomas H. W. Bäck, Anna V. Kononova, Bas Van Stein, Hao Wang, Kirill A. Antonov, Roman T. Kalkreuth, Jacob De Nobel, Diederick Vermetten, Roy De Winter, and Furong Ye. Evolutionary Algorithms for Parameter Optimization—Thirty Years Later.Evolutionary Com- putation, 31(2):81–122, June 2023

  4. [4]

    objective-induced bias and search dynamics in multiobjective unsupervised feature selection

    Mathieu Cherpitel, Thomas Bäck, Martijn R. Tannemaat, and Anna V. Kononova. Supplementary material for “objective-induced bias and search dynamics in multiobjective unsupervised feature selection”, 2026

  5. [5]

    A Cluster Separation Measure.Pat- tern Analysis and Machine Intelligence, IEEE Transactions on, PAMI- 1:224–227, May 1979

    David Davies and Don Bouldin. A Cluster Separation Measure.Pat- tern Analysis and Machine Intelligence, IEEE Transactions on, PAMI- 1:224–227, May 1979

  6. [6]

    A taxonomy of unsupervised feature selection methods including their pros, cons, and challenges.The Journal of Supercomputing, 80(16):24212–24240, November 2024

    Rajesh Dwivedi, Aruna Tiwari, Neha Bharill, Milind Ratnaparkhe, and Alok Kumar Tiwari. A taxonomy of unsupervised feature selection methods including their pros, cons, and challenges.The Journal of Supercomputing, 80(16):24212–24240, November 2024

  7. [7]

    Feature Selection for Unsupervised Learning

    Jennifer G Dy. Feature Selection for Unsupervised Learning

  8. [8]

    February 2000

    Christos Emmanouilidis, Andrew Hunter, and John Macintyre.A mul- tiobjective evolutionary setting for feature selection and a commonality- based crossover operator, volume 1. February 2000. Journal Abbrevia- tion: Proceedings of the 2000 Congress on Evolutionary Computation Pages: 316 vol.1 Publication Title: Proceedings of the 2000 Congress on Evolutiona...

  9. [9]

    Anunsupervisedapproach to feature discretization and selection.Pattern Recognition, 45(9):3048– 3060, September 2012

    ArturJ.FerreiraandMárioA.T.Figueiredo. Anunsupervisedapproach to feature discretization and selection.Pattern Recognition, 45(9):3048– 3060, September 2012

  10. [10]

    Comparison between Supervised and Unsupervised Feature Se- lection Methods:

    Lilli Haar, Katharina Anding, Konstantin Trambitckii, and Gunther Notni. Comparison between Supervised and Unsupervised Feature Se- lection Methods:. InProceedings of the 8th International Conference on Pattern Recognition Applications and Methods, pages 582–589, Prague, Czech Republic, 2019. SCITEPRESS - Science and Technology Publi- cations. 20

  11. [11]

    Pareto front feature selection based on artificial bee colony optimization.Information Sciences, 422:462–479, January 2018

    Emrah Hancer, Bing Xue, Mengjie Zhang, Dervis Karaboga, and Bahriye Akay. Pareto front feature selection based on artificial bee colony optimization.Information Sciences, 422:462–479, January 2018

  12. [12]

    Feature Subset Selection in Unsuper- vised Learning via Multiobjective Optimization.International Journal of Computational Intelligence Research, 2(3), 2006

    Julia Handl and Joshua Knowles. Feature Subset Selection in Unsuper- vised Learning via Multiobjective Optimization.International Journal of Computational Intelligence Research, 2(3), 2006

  13. [13]

    Laplacian Score for Fea- ture Selection

    Xiaofei He, Deng Cai, and Partha Niyogi. Laplacian Score for Fea- ture Selection. InAdvances in Neural Information Processing Systems, volume 18. MIT Press, 2005

  14. [14]

    Ruwang Jiao, Bach Hoai Nguyen, Bing Xue, and Mengjie Zhang. A Survey on Evolutionary Multiobjective Feature Selection in Classifica- tion: Approaches, Applications, and Challenges.IEEE Transactions on Evolutionary Computation, 28(4):1156–1176, August 2024

  15. [15]

    Syn- thetic Data for Feature Selection, November 2022

    Firuz Kamalov, Hana Sulieman, and Aswani Kumar Cherukuri. Syn- thetic Data for Feature Selection, November 2022. arXiv:2211.03035 [cs]

  16. [16]

    Evolutionary model selection in unsupervised learning.Intell

    YongSeog Kim, Nick Street, and Filippo Menczer. Evolutionary model selection in unsupervised learning.Intell. Data Anal., 6:531–556, De- cember 2002

  17. [17]

    Knowles, Richard A

    Joshua D. Knowles, Richard A. Watson, and David W. Corne. Reducing Local Optima in Single-Objective Problems by Multi-objectivization. In Eckart Zitzler, Lothar Thiele, Kalyanmoy Deb, Carlos Artemio Coello Coello, and David Corne, editors,Evolutionary Multi-Criterion Optimization, pages 269–283, Berlin, Heidelberg, 2001. Springer

  18. [18]

    A multi-objective approach for profit-driven feature selection in credit scoring.Decision Support Sys- tems, 120:106–117, May 2019

    Nikita Kozodoi, Stefan Lessmann, Konstantinos Papakonstantinou, Yiannis Gatsoulis, and Bart Baesens. A multi-objective approach for profit-driven feature selection in credit scoring.Decision Support Sys- tems, 120:106–117, May 2019

  19. [19]

    Multiobjectivization of Single-Objective Optimization in Evolutionary Computation: A Survey.IEEE Transactions on Cyber- netics, 53(6):3702–3715, June 2023

    Xiaoliang Ma, Zhitao Huang, Xiaodong Li, Yutao Qi, Lei Wang, and Zexuan Zhu. Multiobjectivization of Single-Objective Optimization in Evolutionary Computation: A Survey.IEEE Transactions on Cyber- netics, 53(6):3702–3715, June 2023

  20. [20]

    Information preserving multi- objective feature selection for unsupervised learning

    Ingo Mierswa and Michael Wurst. Information preserving multi- objective feature selection for unsupervised learning. InProceedings 21 of the 8th annual conference on Genetic and evolutionary computation, GECCO ’06, pages 1545–1552, New York, NY, USA, 2006. Association for Computing Machinery

  21. [21]

    Unsupervised feature selection using feature similarity.Pattern Analysis and Machine Intel- ligence, IEEE Transactions on, 24:301–312, April 2002

    Pabitra Mitra, Chaitra Murthy, and Sankar Pal. Unsupervised feature selection using feature similarity.Pattern Analysis and Machine Intel- ligence, IEEE Transactions on, 24:301–312, April 2002

  22. [22]

    Scikit-learn: Machine Learning in Python

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Courna- peau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning...

  23. [23]

    Rousseeuw, P.J.: Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis

    Peter Rousseeuw. Rousseeuw, P.J.: Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Comput. Appl. Math. 20, 53-65.Journal of Computational and Applied Mathematics, 20:53–65, November 1987

  24. [24]

    Segmented initialization and offspring modification in evolutionary algorithms for bi-objective fea- ture selection

    Hang Xu, Bing Xue, and Mengjie Zhang. Segmented initialization and offspring modification in evolutionary algorithms for bi-objective fea- ture selection. InProceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pages 444–452, New York, NY, USA, 2020. Association for Computing Machinery

  25. [25]

    Bing Xue, Mengjie Zhang, and Will N. Browne. Particle Swarm Opti- mization for Feature Selection in Classification: A Multi-Objective Ap- proach.IEEE Transactions on Cybernetics, 43(6):1656–1671, December 2013. 22