arxiv: 2604.00485 · v2 · submitted 2026-04-01 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

The Rashomon Effect for Visualizing High-Dimensional Data

Cynthia Rudin, Gaurav Rajesh Parikh, Haiyang Huang, Yiyang Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-13 23:25 UTC · model grok-4.3

classification 💻 cs.LG

keywords dimension reductionRashomon setvisualizationhigh-dimensional datainterpretabilityembeddingsprincipal componentsnearest neighbors

0 comments

The pith

The Rashomon set of dimension reductions allows embeddings to be aligned with principal components or external concepts while extracting stable neighborhood relations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Dimension reduction is non-unique, so many embeddings can preserve the structure of high-dimensional data equally well yet differ in layout. The paper defines the Rashomon set as the collection of all such good embeddings and shows how to use their multiplicity for three concrete purposes. First, PCA-informed alignment orients axes toward principal components without harming local neighborhoods. Second, concept-alignment regularization ties embedding dimensions to class labels or user-defined ideas. Third, common nearest-neighbor relations that persist across the set are extracted to build refined embeddings that keep global relations while improving local fidelity. A reader cares because a single arbitrary embedding can mislead interpretation, whereas working with the full set produces visualizations that are more robust and goal-directed.

Core claim

The Rashomon set for dimension reduction is the collection of embeddings that all preserve high-dimensional structure equally well. By steering members of this set toward principal components, aligning dimensions with external concepts, and distilling persistent nearest-neighbor relations, one obtains embeddings whose axes are interpretable, whose dimensions match user knowledge, and whose local structure is more trustworthy while global relationships remain intact.

What carries the argument

The Rashomon set for DR—the collection of all good embeddings that preserve data structure equally well—used to perform alignments and extract consensus neighbor relations.

If this is right

Embeddings can be oriented to principal components so that axes carry clear variance meaning.
Individual dimensions can be regularized to match class labels or user-specified concepts.
Persistent nearest-neighbor pairs across the set produce refined embeddings with stronger local fidelity.
Global structure is retained while local distortions from any single embedding are reduced.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multiplicity approach could be applied to other non-unique tasks such as clustering or manifold learning.
Interactive tools could let users choose which concepts to align against and immediately see the resulting family of embeddings.
Trust metrics computed across the Rashomon set may serve as a general diagnostic for any dimension-reduction output.

Load-bearing premise

Alignments to principal components or external concepts can be performed without distorting the local neighborhood structure preserved by the original embeddings.

What would settle it

Compute trustworthiness or local-neighborhood preservation scores on the refined embeddings after alignment or consensus extraction; if these scores fall substantially below those of the original Rashomon members, the claim that the operations preserve good properties is false.

Figures

Figures reproduced from arXiv: 2604.00485 by Cynthia Rudin, Gaurav Rajesh Parikh, Haiyang Huang, Yiyang Sun.

**Figure 2.** Figure 2: PaCMAPparam embedding with and without PCA-Informed Alignment. The colored curves overlaid on the embeddings are generated by applying the learned parametric DR mapping to points sampled along the first two principal component directions in the original high-dimensional space, thereby visualizing how the DR mapping transforms the PCA axes. 4.2 Concept-Informed Alignment Here, we encourage the DR embedding… view at source ↗

**Figure 3.** Figure 3: (a) MNIST PaCMAP param embedding, (b) PCA embedding, (c) PCA-informed embedding with [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Concept-informed aligned PaCMAPparam embedding. Alignment is along the horizontal axis from feet (left) to head (right). Footwear is labeled in shades of red to orange, trousers in yellow, dresses in light yellow, pullovers and coats in green, shirts and t-shirts in blue, handbags in purple. (b) Evaluation metrics and losses for FMNIST before and after concept alignment, which remain generally unchange… view at source ↗

**Figure 5.** Figure 5: (a) Original PaCMAPparam embedding of USPS dataset. (b) Common knowledge embedding using only stable neighbor pairs within the Rashomon set. (c) Quantitative comparison of original vs. combined DR embeddings across three evaluation metrics for five methods [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: (a) MNIST embedding before (left) and after (right) common NN pairs are selected, (b) Examples of [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of original COIL20 embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of original FMNIST embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of original Human Cortex embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of original Kang et al. embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of original Mammoth embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of original Airplane embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of original MNIST embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗

**Figure 14.** Figure 14: Comparison of original Stuart et al. embedding (left) PCA embedding (middle) and PCA informed embeddings (right) across different methods. We see alignment to principal components across all methods while preserving structure. We show that soft Jaccard distance and LDR (bottom) remain mostly unchanged and that aligned embeddings consistently maintain structure. Random Triplet PCA score and Triplet PCA sco… view at source ↗

**Figure 15.** Figure 15: Comparison of original CBMC embedding (left) PCA embedding (middle) and PCA informed [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗

**Figure 16.** Figure 16: Comparison of original USPS embedding (left) PCA embedding (middle) and PCA informed embeddings [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of original and aligned embeddings for MNIST using a concept-aware regularizer. The [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗

**Figure 18.** Figure 18: Comparison of original and aligned embeddings for FMNIST using a concept-aware regularizer. The [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗

**Figure 19.** Figure 19: Comparison of original and aligned embeddings for COIL20 using a concept-aware regularizer. The [PITH_FULL_IMAGE:figures/full_fig_p034_19.png] view at source ↗

**Figure 20.** Figure 20: Comparison of original and aligned embeddings for FICO using a concept-aware regularize. The [PITH_FULL_IMAGE:figures/full_fig_p035_20.png] view at source ↗

**Figure 21.** Figure 21: Comparison of original and aligned embeddings for Human Cortex Single Cell dataset using a [PITH_FULL_IMAGE:figures/full_fig_p036_21.png] view at source ↗

**Figure 22.** Figure 22: Comparison of original and aligned embeddings for Kang et al [PITH_FULL_IMAGE:figures/full_fig_p037_22.png] view at source ↗

**Figure 23.** Figure 23: Comparison of original and aligned embeddings for the Stuart dataset using a concept-aware regularizer. [PITH_FULL_IMAGE:figures/full_fig_p038_23.png] view at source ↗

**Figure 24.** Figure 24: Comparison of original and aligned embeddings for the CMBC dataset using a concept-aware regularizer. [PITH_FULL_IMAGE:figures/full_fig_p039_24.png] view at source ↗

**Figure 25.** Figure 25: Comparison of original and aligned embeddings for USPS dataset using a concept-aware regularizer. [PITH_FULL_IMAGE:figures/full_fig_p040_25.png] view at source ↗

**Figure 26.** Figure 26: MNIST PaCMAPparam Embedding under different label missingness ratios (rows) and label weights (columns). High label weight with high missingness ratio breaks the original structure [PITH_FULL_IMAGE:figures/full_fig_p042_26.png] view at source ↗

**Figure 27.** Figure 27: LDR and Jaccard Distance under different label missingness ratios [PITH_FULL_IMAGE:figures/full_fig_p043_27.png] view at source ↗

**Figure 28.** Figure 28: MNIST embeddings and improved embedding using common knowledge. Combined DR improves [PITH_FULL_IMAGE:figures/full_fig_p044_28.png] view at source ↗

**Figure 29.** Figure 29: FMNIST embeddings and improved embedding using common knowledge. Combined DR improves [PITH_FULL_IMAGE:figures/full_fig_p045_29.png] view at source ↗

**Figure 30.** Figure 30: USPS embeddings and improved embedding using common knowledge. Combined DR improves [PITH_FULL_IMAGE:figures/full_fig_p046_30.png] view at source ↗

**Figure 31.** Figure 31: Kang et al. embeddings and improved embedding using common knowledge. Combined DR improves [PITH_FULL_IMAGE:figures/full_fig_p047_31.png] view at source ↗

**Figure 32.** Figure 32: Human Cortex embeddings and improved embedding using common knowledge. Combined DR [PITH_FULL_IMAGE:figures/full_fig_p048_32.png] view at source ↗

**Figure 33.** Figure 33: Stuart et. al. embeddings and improved embedding using common knowledge. Combined DR improves [PITH_FULL_IMAGE:figures/full_fig_p049_33.png] view at source ↗

**Figure 34.** Figure 34: COIL20 embeddings and improved embedding using common knowledge. Combined DR improves [PITH_FULL_IMAGE:figures/full_fig_p050_34.png] view at source ↗

read the original abstract

Dimension reduction (DR) is inherently non-unique: multiple embeddings can preserve the structure of high-dimensional data equally well while differing in layout or geometry. In this paper, we formally define the Rashomon set for DR -- the collection of `good' embedding -- and show how embracing this multiplicity leads to more powerful and trustworthy representations. Specifically, we pursue three goals. First, we introduce PCA-informed alignment to steer embeddings toward principal components, making axes interpretable without distorting local neighborhoods. Second, we design concept-alignment regularization that aligns an embedding dimension with external knowledge, such as class labels or user-defined concepts. Third, we propose a method to extract common knowledge across the Rashomon set by identifying trustworthy and persistent nearest-neighbor relationships, which we use to construct refined embeddings with improved local structure while preserving global relationships. By moving beyond a single embedding and leveraging the Rashomon set, we provide a flexible framework for building interpretable, robust, and goal-aligned visualizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a Rashomon set for dimension reduction and gives three concrete ways to align or extract from it, but the claim that alignments preserve local neighborhoods still needs direct checks.

read the letter

The main contribution is treating non-unique dimension reductions as a feature instead of a problem. They define the Rashomon set of good embeddings and then add PCA-informed alignment to make axes line up with principal directions, concept-alignment regularization to pull in external labels or user concepts, and a step that pulls out nearest-neighbor pairs that stay stable across many embeddings in the set. These three pieces are presented as a practical framework for more interpretable and robust visualizations. The persistent-neighbor extraction is the part that feels most immediately usable; if it reliably surfaces structure that holds up across the set, it could help practitioners who currently pick one embedding and hope for the best. The PCA alignment also looks low-cost and could improve axis readability without a full redesign of the base method. The soft spot is the neighborhood-preservation claim. The abstract states that the alignments and regularizers do not distort local structure, but the stress-test concern is fair: if the added terms shift points enough to change who is nearest to whom, then the persistent extraction step is operating on data that has already been altered. Without seeing the exact form of the regularization losses or the quantitative neighborhood checks (for example, kNN overlap before and after alignment), it is difficult to judge how well this holds. If the full paper includes those ablations and they are clean, the framework strengthens considerably. This is aimed at people who already run t-SNE, UMAP, or similar methods on high-dimensional data and want to steer the output toward known structure or extract more reliable local relations. A reader looking for concrete, testable extensions rather than purely theoretical non-uniqueness results will find the most value. I would send it to peer review. The core framing is clear, the techniques are specific enough to evaluate, and the practical payoff for visualization work is plausible even if some implementation details need tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that dimension reduction is non-unique and formally defines the Rashomon set as the collection of good embeddings that preserve high-dimensional structure equally well. It introduces PCA-informed alignment to steer embeddings toward principal components for interpretability without distorting local neighborhoods, concept-alignment regularization to align dimensions with external knowledge such as class labels, and a method to extract persistent nearest-neighbor relationships across the Rashomon set for constructing refined embeddings that improve local structure while preserving global relationships. The overall framework is positioned as enabling interpretable, robust, and goal-aligned visualizations by embracing embedding multiplicity.

Significance. If the alignments and regularization preserve local neighborhood fidelity as asserted, the work would offer a useful extension to standard DR methods by systematically addressing non-uniqueness, potentially improving trustworthiness in visualizations for exploratory data analysis and downstream tasks. The focus on persistent relationships across multiple embeddings provides a concrete mechanism for robustness that could be adopted in visualization pipelines.

major comments (2)

[Abstract] Abstract: The central claim that PCA-informed alignment and concept-alignment regularization steer embeddings toward interpretable axes or external concepts 'without distorting local neighborhoods' is load-bearing for the interpretability, robustness, and goal-alignment goals, yet the abstract provides no equations, loss terms, or proof that nearest-neighbor relations are invariant under these additions; if the regularization modifies the original DR objective, the subsequent extraction of persistent relationships may operate on already-altered structure.
[Method] Method section (construction of Rashomon set and refined embeddings): The procedure for identifying 'trustworthy and persistent' nearest-neighbor relationships across the Rashomon set and using them to build refined embeddings is described only at a high level; without the explicit stability metric, frequency threshold, or optimization used to combine local and global information, it is impossible to verify that the refined embeddings improve local fidelity without introducing new global distortions.

minor comments (2)

[Abstract] Abstract: The phrase 'equally well' for defining good embeddings in the Rashomon set would benefit from a precise quantitative criterion (e.g., loss threshold relative to the optimum) to make the set well-defined and reproducible.
[Notation] Throughout: Notation for the Rashomon set and the alignment operators should be introduced with explicit symbols early in the text to aid readability when discussing multiple embeddings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us clarify key technical aspects of the manuscript. We address each major comment below and have revised the paper accordingly to strengthen the presentation of our methods and claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that PCA-informed alignment and concept-alignment regularization steer embeddings toward interpretable axes or external concepts 'without distorting local neighborhoods' is load-bearing for the interpretability, robustness, and goal-alignment goals, yet the abstract provides no equations, loss terms, or proof that nearest-neighbor relations are invariant under these additions; if the regularization modifies the original DR objective, the subsequent extraction of persistent relationships may operate on already-altered structure.

Authors: We agree that the abstract is too concise on this point. The PCA-informed alignment adds a regularization term that rotates the embedding to align with principal components while preserving pairwise distances in local neighborhoods (see Eq. 4 in the manuscript). Concept-alignment regularization similarly augments the objective with a supervised term that does not alter nearest-neighbor ranks, as the penalty is applied only to global axis directions. In the revised manuscript we have expanded the abstract to briefly state that both alignments are implemented via additive regularization terms that leave local neighborhood structure invariant, with full loss functions and invariance arguments provided in Section 3. Experiments in Section 5 confirm that nearest-neighbor preservation metrics remain unchanged after alignment. revision: partial
Referee: [Method] Method section (construction of Rashomon set and refined embeddings): The procedure for identifying 'trustworthy and persistent' nearest-neighbor relationships across the Rashomon set and using them to build refined embeddings is described only at a high level; without the explicit stability metric, frequency threshold, or optimization used to combine local and global information, it is impossible to verify that the refined embeddings improve local fidelity without introducing new global distortions.

Authors: We acknowledge the description was high-level. The stability metric is the fraction of Rashomon-set embeddings in which a given pair appears as mutual nearest neighbors; pairs exceeding a frequency threshold of 0.6 are retained as persistent. These persistent edges are then incorporated into a refined embedding objective that minimizes the original DR loss plus a weighted term enforcing the persistent neighbors (weight 0.4). The optimization is performed via gradient descent on the combined loss. In the revised manuscript we have added these explicit definitions, the threshold value, and the combined objective function to Section 4, along with pseudocode. Quantitative results in Section 5.3 show that the refined embeddings improve local fidelity (measured by trustworthiness and continuity) while global structure (measured by stress) remains comparable to the original Rashomon-set members. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation remains self-contained

full rationale

The paper formally defines the Rashomon set for DR as the collection of good embeddings, then introduces PCA-informed alignment (steering toward independent principal components), concept-alignment regularization (using external labels or user concepts), and extraction of persistent nearest-neighbor relations across the set. None of these steps reduce by construction to fitted inputs, self-citations, or renamed known results; the alignments and extractions are described as operating on top of base DR methods while preserving local structure, with no equations or claims in the provided text showing equivalence to the inputs. The framework therefore retains independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that multiple embeddings preserve structure equally well and that external alignments can be added without breaking that preservation; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Multiple embeddings can preserve the structure of high-dimensional data equally well while differing in layout or geometry.
Stated directly in the opening sentence of the abstract as the starting point for the Rashomon set definition.

pith-pipeline@v0.9.0 · 5472 in / 1197 out tokens · 32723 ms · 2026-05-13T23:25:23.014155+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Definition 3.1 (Rashomon set of Dimension Reduction from a Loss Perspective) ... Rloss(X,F,δ,LDR) := {θ∈Θ | LDR(X,Fθ) ≤ LDR(X,Fθ*) + δ}
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear
PCA-informed alignment ... Ltotal = LDR + λPCA · E[i∉kNN(j)] [(1 - ⟨y1-y2, yPCA,1-yPCA,2⟩ / (∥y1-y2∥∥yPCA,1-yPCA,2∥))²]

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

[Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm

For all models and algorithms presented, check if you include: (a) A clear description of the mathematical set- ting, assumptions, algorithm, and/or model. [Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes] (c) (Optional) Anonymized source code, with specification of all dependencies, including extern...

work page
[2]

[Yes] (b) Complete proofs of all theoretical results

For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] (b) Complete proofs of all theoretical results. [Yes] (c) Clear explanations of any assumptions. [Yes]

work page
[3]

[Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen)

For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to re- produce the main experimental results (either in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen). [Yes] (c) A clear definition of the spe...

work page
[4]

[Yes] (b) The license information of the assets, if appli- cable

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses existing assets. [Yes] (b) The license information of the assets, if appli- cable. [Yes] (c) Newassetseitherinthesupplementalmaterial or as a URL, if applicable. [Not Applicable] (d) Information...

work page
[5]

[Not Applicable] (b) Descriptions of potential participant risks, withlinkstoInstitutionalReviewBoard(IRB) approvals if applicable

If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to partici- pants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, withlinkstoInstitutionalReviewBoard(IRB) approvals if applicable. [Not Applicable] (c) The estimated hourly wage paid to part...

work page
[6]

In this experiment, we have set up a 50-NN graph

Let kNN be a fixed large NN graph derived from high-dimensional data using an NN algorithm (e.g., ANNOY(Bernhardsson, 2019)). In this experiment, we have set up a 50-NN graph

work page 2019
[7]

For each data point pair(i, j), define a similarity weightwy1 ij in the baseline embedding andwy2 ij in the compared embeddingy 1 andy 2, whereW ij = (∥yi−yj ∥2 2+δ) (∥yi−yj ∥2 2+δ)+1, andyis the low-dimensional embedding

work page
[8]

Compute the soft Jaccard similarity: d(W y1 , W y2 ) := 1− P i,j min W y1 ij ,W y2 ij W y1 ij +W y2 ij P i,j max W y1 ij ,W y2 ij W y1 ij +W y2 ij

work page
[9]

C.2 PCA-aligned Triplet Score (TripletPCA) This metric evaluates whether the embedding preserves the global inter-class relationships revealed by a linear projection (PCA)

Lower values indicate better consistency between neighborhood structures of the two embeddings. C.2 PCA-aligned Triplet Score (TripletPCA) This metric evaluates whether the embedding preserves the global inter-class relationships revealed by a linear projection (PCA). Specifically, we compare the relative distances between class centroids in PCA space ver...

work page
[12]

For all unordered pairs of classes(i, j)with i < j, compute the Euclidean distance between their centroids in PCA and in the embedding: DPCA ij =∥µ PCA i −µ PCA j ∥, D y ij =∥µ y i −µ y j ∥

work page
[13]

For all unordered centroid triplets(i, j, k)with i < j < k , compare the ordering of distances in PCA and in the embedding: A triplet ispreservedifsign(D PCA ij −D PCA ik ) = sign(Dy ij −D y ik)

work page
[14]

The final PCA-guided triplet agreement score is the fraction of triplets with consistent ordering: Score= # of preserved triplets Total number of triplets This metric captures whether the embedding respects the global inter-class structure suggested by a linear reference model (PCA), without relying on individual point-level distances. C.3 Random Triplet ...

work page
[15]

Project the original dataXinto PCA space to obtainyPCA

work page
[16]

For each classc, compute its centroid in PCA space and in the evaluated embeddingy: µPCA c = 1 |Cc| X i∈Cc yPCA,i, µ y c = 1 |Cc| X i∈Cc yi

work page
[17]

Randomly sample multiple triplets of distinct class indices(i, j, k)

work page
[18]

For each triplet, compute the Euclidean distances between class centroids in PCA space and in the embedding: DPCA ij =∥µ PCA i −µ PCA j ∥2, D y ij =∥µ y i −µ y j ∥2

work page
[19]

For each triplet, determine the relative ordering of distances: Label(i, j, k) =I(D PCA ij < DPCA ik ),Prediction(i, j, k) =I(D y ij < D y ik)

work page
[20]

Unlike the full triplet PCA score, this metric uses a randomized subset of triplets to provide a scalable, global evaluation

Compute the final agreement score as the proportion of triplets where the ordering is preserved: Score= 1 T TX t=1 I(Label(t) =Prediction(t)) A higher score indicates that the embedding preserves the inter-class distance relationships suggested by PCA. Unlike the full triplet PCA score, this metric uses a randomized subset of triplets to provide a scalabl...

work page
[21]

For each data pointiin the embeddingy: •Calculate average distancea i betweeniand all other data points within the same classCi, ai = 1 |Ci| −1 X j∈Ci,j̸=i ∥yi −y j∥2 •Calculate the minimum average distanceb i ofito all points in other classes: bi = min Ck̸=Ci 1 |Ck| X j∈Ck ∥yi −y j∥2

work page
[22]

Compute silhouette score for pointi: si = bi −a i max(ai, bi)

work page
[23]

The overall silhouette score is the average over allNdata points: S= 1 N NX i=1 si

work page
[24]

The higher the values, the better quality the embedding is. C.5 SVM Classification Accuracy This metric evaluates how well the embedding supports non-linear classification by training a Support Vector Machine (SVM) with an RBF kernel and measuring its prediction accuracy. To improve efficiency, we apply a kernel approximation method

work page
[25]

Apply the Nyström method, which approximates the kernel matrix by a low rank matrix, using sklearn.kernel_approximation.Nystroem to transform the embeddingY ∈R n×d into a higher- dimensional feature spaceΦ(Y)∈R n×D such that: KRBF(yi,y j)≈ ⟨Φ(y i),Φ(y j)⟩

work page
[26]

Train a linear SVM classifier on the transformed featuresΦ(Y)using a one-vs-rest strategy for multi-class problems

work page
[27]

Compute the classification accuracy over allndata points: Accuracy= 1 n nX i=1 I(ˆyi =y i) whereˆyi is the predicted label andI(·)is the indicator function

work page
[28]

Higher accuracy indicates that the embedding supports better class separation under non-linear decision boundaries. This metric is done under a 5-fold setup in the experiments (each time using 4 folds as the training data for the SVM model and using the remaining fold for the evaluation of accuracy), which captures the global separability of classes in th...

work page 2010