super hub

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

James Melville, John Healy, Leland McInnes · 2018 · stat.ML · arXiv 1802.03426

111 Pith papers cite this work. Polarity classification is still indexing.

111 Pith papers citing it

open full Pith review browse 111 citing papers more from James Melville arXiv PDF

abstract

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique

authors

James Melville John Healy Leland McInnes

co-cited works

representative citing papers

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

cs.CV · 2026-04-28 · unverdicted · novelty 8.0 · 2 refs

The first public dataset of 10,217 GPT-Image-2 generated images sourced from Twitter in the week after release, with CLIP taxonomy, OCR, face detection, clustering analyses, and a finding that C2PA provenance data is stripped on upload.

On the continuum limit of t-SNE for data visualization

stat.ML · 2026-04-13 · unverdicted · novelty 8.0

t-SNE converges in the large-data limit to a non-convex variational energy with attraction and repulsion terms that admits a unique smooth minimizer but infinitely many discontinuous ones in one dimension.

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

cs.CV · 2026-04-08 · unverdicted · novelty 8.0

Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.

PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Preference fine-tuning outperforms prompting for personalisation but amplifies sycophancy and relationship-seeking, while simulated users recover aggregate rankings yet show far lower self-consistency and different topic and position biases than real humans.

Much of Geospatial Web Search Is Beyond Traditional GIS

cs.IR · 2026-05-11 · unverdicted · novelty 7.0

Analysis of 1.01 million unfiltered Bing queries identifies 18% as geospatial, dominated by transactional categories like costs (15.3%) that exceed traditional GIS scope.

Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction

astro-ph.IM · 2026-05-11 · unverdicted · novelty 7.0

LLMs prompted with increasing levels of text on TNO spectral reconstruction from photometry reveal an entropy floor where implementation variance persists, showing text alone cannot capture all tacit expert knowledge needed for exact replication.

An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies

cs.SI · 2026-05-09 · unverdicted · novelty 7.0

Hybrid human-AI networks in 5x5 grids reached lower final polarization than human-only networks after eight rounds of opinion revision on polarizing topics.

Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.

eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.

Knowing when to trust machine-learned interatomic potentials

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

PROBE recasts MLIP uncertainty quantification as selective classification by training a compact discriminative classifier on frozen per-atom backbone embeddings, yielding a reliability probability that tracks actual error better than ensemble disagreement.

Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection

cs.CV · 2026-04-29 · unverdicted · novelty 7.0

Sparse autoencoders on ViT class tokens reveal stable Class Activation Profiles for in-distribution data, enabling OOD detection via divergence from core energy profiles.

From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

A cross-cultural survey finds LLM emotional support adoption ranges from 20% to 59% by country, with positive perceptions strongest among higher-SES, religious, married adults aged 25-44 and in English-speaking nations.

The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook

cs.CY · 2026-04-23 · unverdicted · novelty 7.0

Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.

Participatory provenance as representational auditing for AI-mediated public consultation

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.

Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.

Neighbor Embedding for High-Dimensional Sparse Poisson Data

stat.ML · 2026-04-18 · unverdicted · novelty 7.0

p-SNE embeds sparse Poisson count data into low dimensions by using KL divergence between Poisson distributions to measure pairwise dissimilarity and Hellinger distance to optimize the layout.

Physics-informed, Generative Adversarial Design of Funicular Shells

cs.CE · 2026-04-17 · unverdicted · novelty 7.0

A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

MADE creates a contamination-resistant living benchmark for multi-label classification of medical device adverse events, with evaluations revealing model-specific trade-offs in accuracy and uncertainty quantification.

Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.

L-fuzzy simplicial homology

math.AT · 2026-04-09 · unverdicted · novelty 7.0

L-fuzzy simplicial homology generalizes simplicial homology to L-fuzzy subcomplexes by assigning values from a completely distributive lattice L to simplices and deriving associated homology modules.

Emotion Concepts and their Function in a Large Language Model

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.

Dynamic Context Evolution for Scalable Synthetic Data Generation

cs.CL · 2026-04-08 · conditional · novelty 7.0

Dynamic Context Evolution prevents cross-batch mode collapse in LLMs by combining model self-assessment for idea filtering, embedding-based deduplication, and evolving prompts, yielding zero collapse and consistently richer idea clusters than naive prompting.

Are We Recognizing the Jaguar or Its Background? A Diagnostic Framework for Jaguar Re-Identification

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A new diagnostic framework using inpainted context ratios and laterality checks on a Pantanal jaguar benchmark reveals whether re-ID models depend on coat patterns or spurious background evidence.

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

cs.CR · 2026-03-31 · unverdicted · novelty 7.0

SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022-08-23 · accept · none · ref 38 · internal anchor
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 48 · internal anchor
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.