super hub

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

James Melville, John Healy, Leland McInnes · 2018 · stat.ML · arXiv 1802.03426

110 Pith papers cite this work. Polarity classification is still indexing.

110 Pith papers citing it

open full Pith review browse 110 citing papers more from James Melville arXiv PDF

abstract

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique

authors

James Melville John Healy Leland McInnes

co-cited works

representative citing papers

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

cs.CV · 2026-04-28 · unverdicted · novelty 8.0 · 2 refs

The first public dataset of 10,217 GPT-Image-2 generated images sourced from Twitter in the week after release, with CLIP taxonomy, OCR, face detection, clustering analyses, and a finding that C2PA provenance data is stripped on upload.

On the continuum limit of t-SNE for data visualization

stat.ML · 2026-04-13 · unverdicted · novelty 8.0

t-SNE converges in the large-data limit to a non-convex variational energy with attraction and repulsion terms that admits a unique smooth minimizer but infinitely many discontinuous ones in one dimension.

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

cs.CV · 2026-04-08 · unverdicted · novelty 8.0

Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.

Much of Geospatial Web Search Is Beyond Traditional GIS

cs.IR · 2026-05-11 · unverdicted · novelty 7.0

Analysis of 1.01 million unfiltered Bing queries identifies 18% as geospatial, dominated by transactional categories like costs (15.3%) that exceed traditional GIS scope.

Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction

astro-ph.IM · 2026-05-11 · unverdicted · novelty 7.0

LLMs prompted with increasing levels of text on TNO spectral reconstruction from photometry reveal an entropy floor where implementation variance persists, showing text alone cannot capture all tacit expert knowledge needed for exact replication.

An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies

cs.SI · 2026-05-09 · unverdicted · novelty 7.0

Hybrid human-AI networks in 5x5 grids reached lower final polarization than human-only networks after eight rounds of opinion revision on polarizing topics.

Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.

eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.

Knowing when to trust machine-learned interatomic potentials

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

PROBE recasts MLIP uncertainty quantification as selective classification by training a compact discriminative classifier on frozen per-atom backbone embeddings, yielding a reliability probability that tracks actual error better than ensemble disagreement.

Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection

cs.CV · 2026-04-29 · unverdicted · novelty 7.0

Sparse autoencoders on ViT class tokens reveal stable Class Activation Profiles for in-distribution data, enabling OOD detection via divergence from core energy profiles.

From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

A cross-cultural survey finds LLM emotional support adoption ranges from 20% to 59% by country, with positive perceptions strongest among higher-SES, religious, married adults aged 25-44 and in English-speaking nations.

The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook

cs.CY · 2026-04-23 · unverdicted · novelty 7.0

Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.

Participatory provenance as representational auditing for AI-mediated public consultation

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.

Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.

Neighbor Embedding for High-Dimensional Sparse Poisson Data

stat.ML · 2026-04-18 · unverdicted · novelty 7.0

p-SNE embeds sparse Poisson count data into low dimensions by using KL divergence between Poisson distributions to measure pairwise dissimilarity and Hellinger distance to optimize the layout.

Physics-informed, Generative Adversarial Design of Funicular Shells

cs.CE · 2026-04-17 · unverdicted · novelty 7.0

A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

MADE creates a contamination-resistant living benchmark for multi-label classification of medical device adverse events, with evaluations revealing model-specific trade-offs in accuracy and uncertainty quantification.

Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.

L-fuzzy simplicial homology

math.AT · 2026-04-09 · unverdicted · novelty 7.0

L-fuzzy simplicial homology generalizes simplicial homology to L-fuzzy subcomplexes by assigning values from a completely distributive lattice L to simplices and deriving associated homology modules.

Emotion Concepts and their Function in a Large Language Model

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.

Dynamic Context Evolution for Scalable Synthetic Data Generation

cs.CL · 2026-04-08 · conditional · novelty 7.0

Dynamic Context Evolution prevents cross-batch mode collapse in LLMs by combining model self-assessment for idea filtering, embedding-based deduplication, and evolving prompts, yielding zero collapse and consistently richer idea clusters than naive prompting.

Are We Recognizing the Jaguar or Its Background? A Diagnostic Framework for Jaguar Re-Identification

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A new diagnostic framework using inpainted context ratios and laterality checks on a Pantanal jaguar benchmark reveals whether re-ID models depend on coat patterns or spurious background evidence.

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

cs.CR · 2026-03-31 · unverdicted · novelty 7.0

SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.

Scaling and evaluating sparse autoencoders

cs.LG · 2024-06-06 · unverdicted · novelty 7.0

K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.

citing papers explorer

Showing 50 of 110 citing papers.

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment cs.CV · 2026-04-28 · unverdicted · none · ref 19 · 2 links · internal anchor
The first public dataset of 10,217 GPT-Image-2 generated images sourced from Twitter in the week after release, with CLIP taxonomy, OCR, face detection, clustering analyses, and a finding that C2PA provenance data is stripped on upload.
On the continuum limit of t-SNE for data visualization stat.ML · 2026-04-13 · unverdicted · none · ref 37 · internal anchor
t-SNE converges in the large-data limit to a non-convex variational energy with attraction and repulsion terms that admits a unique smooth minimizer but infinitely many discontinuous ones in one dimension.
Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation cs.CV · 2026-04-08 · unverdicted · none · ref 4 · internal anchor
Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.
Much of Geospatial Web Search Is Beyond Traditional GIS cs.IR · 2026-05-11 · unverdicted · none · ref 16 · internal anchor
Analysis of 1.01 million unfiltered Bing queries identifies 18% as geospatial, dominated by transactional categories like costs (15.3%) that exceed traditional GIS scope.
Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction astro-ph.IM · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
LLMs prompted with increasing levels of text on TNO spectral reconstruction from photometry reveal an entropy floor where implementation variance persists, showing text alone cannot capture all tacit expert knowledge needed for exact replication.
An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies cs.SI · 2026-05-09 · unverdicted · none · ref 13 · internal anchor
Hybrid human-AI networks in 5x5 grids reached lower final polarization than human-only networks after eight rounds of opinion revision on polarizing topics.
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection cs.CV · 2026-05-09 · unverdicted · none · ref 59 · internal anchor
A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts cs.CV · 2026-05-07 · unverdicted · none · ref 18 · internal anchor
eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.
Knowing when to trust machine-learned interatomic potentials cs.LG · 2026-05-01 · unverdicted · none · ref 58 · internal anchor
PROBE recasts MLIP uncertainty quantification as selective classification by training a compact discriminative classifier on frozen per-atom backbone embeddings, yielding a reliability probability that tracks actual error better than ensemble disagreement.
Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection cs.CV · 2026-04-29 · unverdicted · none · ref 27 · internal anchor
Sparse autoencoders on ViT class tokens reveal stable Class Activation Profiles for in-distribution data, enabling OOD detection via divergence from core energy profiles.
From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support cs.CL · 2026-04-28 · unverdicted · none · ref 26 · internal anchor
A cross-cultural survey finds LLM emotional support adoption ranges from 20% to 59% by country, with positive perceptions strongest among higher-SES, religious, married adults aged 25-44 and in English-speaking nations.
The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook cs.CY · 2026-04-23 · unverdicted · none · ref 9 · internal anchor
Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.
Participatory provenance as representational auditing for AI-mediated public consultation cs.AI · 2026-04-22 · unverdicted · none · ref 15 · internal anchor
Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment cs.CV · 2026-04-18 · unverdicted · none · ref 23 · internal anchor
RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.
Neighbor Embedding for High-Dimensional Sparse Poisson Data stat.ML · 2026-04-18 · unverdicted · none · ref 2 · internal anchor
p-SNE embeds sparse Poisson count data into low dimensions by using KL divergence between Poisson distributions to measure pairwise dissimilarity and Hellinger distance to optimize the layout.
Physics-informed, Generative Adversarial Design of Funicular Shells cs.CE · 2026-04-17 · unverdicted · none · ref 45 · internal anchor
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events cs.CL · 2026-04-16 · unverdicted · none · ref 4 · internal anchor
MADE creates a contamination-resistant living benchmark for multi-label classification of medical device adverse events, with evaluations revealing model-specific trade-offs in accuracy and uncertainty quantification.
Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment cs.CL · 2026-04-12 · unverdicted · none · ref 92 · internal anchor
Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.
L-fuzzy simplicial homology math.AT · 2026-04-09 · unverdicted · none · ref 16 · internal anchor
L-fuzzy simplicial homology generalizes simplicial homology to L-fuzzy subcomplexes by assigning values from a completely distributive lattice L to simplices and deriving associated homology modules.
Emotion Concepts and their Function in a Large Language Model cs.AI · 2026-04-09 · unverdicted · none · ref 13 · internal anchor
Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.
Dynamic Context Evolution for Scalable Synthetic Data Generation cs.CL · 2026-04-08 · conditional · none · ref 7 · internal anchor
Dynamic Context Evolution prevents cross-batch mode collapse in LLMs by combining model self-assessment for idea filtering, embedding-based deduplication, and evolving prompts, yielding zero collapse and consistently richer idea clusters than naive prompting.
Are We Recognizing the Jaguar or Its Background? A Diagnostic Framework for Jaguar Re-Identification cs.CV · 2026-04-06 · unverdicted · none · ref 28 · internal anchor
A new diagnostic framework using inpainted context ratios and laterality checks on a Pantanal jaguar benchmark reveals whether re-ID models depend on coat patterns or spurious background evidence.
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning cs.CR · 2026-03-31 · unverdicted · none · ref 47 · internal anchor
SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
Scaling and evaluating sparse autoencoders cs.LG · 2024-06-06 · unverdicted · none · ref 42 · internal anchor
K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 161 · internal anchor
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction q-bio.GN · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
Set-aggregated genome embeddings from genomic language models predict microbiome abundance profiles with improved generalization to novel genomes over classical bioinformatics methods.
Probing Non-Equilibrium Grain Boundary Dynamics with XPCS and Domain-Adaptive Machine Learning cond-mat.mtrl-sci · 2026-05-12 · unverdicted · none · ref 30 · internal anchor
XPCS fluctuation maps analyzed via domain-adaptive ML trained on continuum simulations yield bulk diffusivity, GB stiffness, and effective GB concentration, demonstrating persistent non-equilibrium GB relaxation in nanocrystalline Si.
BoolXLLM: LLM-Assisted Explainability for Boolean Models cs.AI · 2026-05-12 · unverdicted · none · ref 22 · internal anchor
BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
Toward Modeling Player-Specific Chess Behaviors cs.AI · 2026-05-12 · unverdicted · none · ref 11 · internal anchor
Champion-specific embeddings and limited MCTS in Maia-2 reduce average Jensen-Shannon divergence to 16 historical chess champions' move distributions in a new latent-space metric, even as standard move accuracy falls.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 43 · internal anchor
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 2 · internal anchor
FastUMAP speeds up UMAP by 15x on 70k-point datasets via bipartite landmark sampling and Nystrom initialization while retaining 96% of the kNN accuracy of stronger baselines.
SOMA: Efficient Multi-turn LLM Serving via Small Language Model cs.CL · 2026-05-11 · unverdicted · none · ref 30 · internal anchor
SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.
Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model cs.LG · 2026-05-10 · unverdicted · none · ref 24 · internal anchor
A cross-modal masked autoencoder creates reusable biosignal fingerprints that match or exceed specialist models on seven cardiovascular tasks using only single-modality input.
In-Context Black-Box Optimization with Unreliable Feedback cs.LG · 2026-05-07 · unverdicted · none · ref 39 · internal anchor
FICBO pretrains a feedback-aware transformer with a structured prior on feedback distortion to adaptively exploit or ignore unreliable auxiliary signals during in-context black-box optimization.
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions cs.RO · 2026-05-07 · unverdicted · none · ref 29 · internal anchor
DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on five dexterous tasks.
Practical validation of synthetic pre-crash scenarios cs.RO · 2026-05-06 · unverdicted · none · ref 71 · internal anchor
A binning-based Bayesian ROPE equivalence testing method is introduced to quantitatively assess practical equivalence between synthetic and real pre-crash scenario datasets for driving automation safety impact evaluation.
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models cs.AI · 2026-05-05 · unverdicted · none · ref 40 · internal anchor
MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unverdicted · none · ref 162 · internal anchor
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
DR-SNE: Density-Regularized Stochastic Neighbor Embedding cs.LG · 2026-05-03 · unverdicted · none · ref 8 · internal anchor
DR-SNE augments the SNE objective with a density regularization term from normalized log-density estimates to preserve relative densities while retaining neighborhood structure.
Retrieval with Multiple Query Vectors through Anomalous Pattern Detection cs.LG · 2026-05-03 · unverdicted · none · ref 37 · internal anchor
A retrieval approach identifies anomalous dimensions in a set of query vectors and retrieves database vectors that are anomalous across those dimensions, with performance improving as query set size grows to around 8.
LLM-Augmented Semantic Steering of Text Embedding Projection Spaces cs.HC · 2026-05-03 · unverdicted · none · ref 25 · internal anchor
LLM-augmented semantic steering lets analysts reshape text embedding projections by providing semantic groupings that an LLM externalizes and extends to improve alignment with intended structures using minimal interaction.
Robust Conditional Conformal Prediction via Branched Normalizing Flow cs.LG · 2026-05-03 · unverdicted · none · ref 40 · internal anchor
Branched Normalizing Flow improves conditional coverage robustness of conformal prediction under distribution shift by normalizing test inputs to the calibration distribution and mapping prediction sets back.
Disentangled Anatomy-Disease Diffusion (DADD) for Controllable Ulcerative Colitis Progression Synthesis cs.CV · 2026-05-03 · unverdicted · none · ref 21 · internal anchor
DADD disentangles anatomy and disease in a latent diffusion model using a Feature Purifier, ordinal disease embeddings, and Delta Steering to synthesize controllable ulcerative colitis progression images.
Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing cs.CL · 2026-05-01 · unverdicted · none · ref 21 · internal anchor
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
Class Angular Distortion Index for Dimensionality Reduction cs.LG · 2026-05-01 · unverdicted · none · ref 35 · internal anchor
CADI quantifies the preservation of relative cluster angles in low-dimensional projections using internal angles from point triples.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 42 · internal anchor
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Diverse Image Priors for Black-box Data-free Knowledge Distillation cs.LG · 2026-04-28 · unverdicted · none · ref 36 · internal anchor
DIP-KD achieves state-of-the-art results in black-box data-free knowledge distillation across 12 benchmarks by synthesizing diverse image priors, applying contrastive learning, and using a primer student for soft-probability transfer.
DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale cs.LG · 2026-04-28 · unverdicted · none · ref 2 · internal anchor
DiRe recovers exact first Betti numbers on noisy manifold stress tests, matches or beats GPU UMAP on classification, and preserves 3-4 times more topological structure than UMAP on 723K arXiv embeddings at similar speed.
Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding cs.LG · 2026-04-27 · unverdicted · none · ref 3 · internal anchor
NBSE identifies the Nishimori temperature where the Bethe Hessian singularizes to embed features via degree-corrected diffusion and selects one representative per redundant group, preserving accuracy at 30% retention on EfficientNet-B4.
StarCLR: Contrastive Learning Representation for Astronomical Light Curves astro-ph.SR · 2026-04-27 · conditional · none · ref 27 · internal anchor
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.