super hub Mixed citations

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

James Melville, John Healy, Leland McInnes · 2018 · stat.ML · arXiv 1802.03426

Mixed citation behavior. Most common role is background (50%).

295 Pith papers citing it

Background 50% of classified citations

open full Pith review browse 295 citing papers more from James Melville arXiv PDF

abstract

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 18 method 16 other 2

citation-polarity summary

background 18 use method 16 unclear 2

claims ledger

abstract UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique

authors

James Melville John Healy Leland McInnes

co-cited works

representative citing papers

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

cs.CV · 2026-04-28 · unverdicted · novelty 8.0 · 2 refs

The first public dataset of 10,217 GPT-Image-2 generated images sourced from Twitter in the week after release, with CLIP taxonomy, OCR, face detection, clustering analyses, and a finding that C2PA provenance data is stripped on upload.

On the continuum limit of t-SNE for data visualization

stat.ML · 2026-04-13 · unverdicted · novelty 8.0

t-SNE converges in the large-data limit to a non-convex variational energy with attraction and repulsion terms that admits a unique smooth minimizer but infinitely many discontinuous ones in one dimension.

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

cs.CV · 2026-04-08 · unverdicted · novelty 8.0

Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks

cs.CR · 2026-01-20 · unverdicted · novelty 8.0

FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Discovering Language Model Behaviors with Model-Written Evaluations

cs.CL · 2022-12-19 · unverdicted · novelty 8.0

Language models can automatically generate high-quality evaluation datasets that reveal new cases of inverse scaling, sycophancy, and concerning goal-seeking behaviors, including some worsened by RLHF.

#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

cs.CV · 2021-03-21 · unverdicted · novelty 8.0

The paper introduces #PraCegoVer, the first large-scale image captioning dataset in Portuguese sourced from Instagram posts with single user-generated captions per image.

Zero-Shot Quantization for Object Detectors using Off-the-Shelf Generative Models

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

GoodQ uses generative models with information-dense prompting, distribution-aware selection, and teacher-guided noise reduction to achieve SOTA low-bit (W4A4) and extreme-bit (W3A3) zero-shot quantization for object detectors.

Self-Organized Conformal Prediction: Reducing Regional Coverage Gaps with Unsupervised Group Discovery

stat.ML · 2026-06-28 · unverdicted · novelty 7.0

SOCP uses self-organizing maps for unsupervised group discovery to enable local calibration in conformal prediction, reducing regional coverage gaps on benchmarks with small set-size increases while preserving validity guarantees.

Beyond the Reranker: Do RAG Retrieval Enhancements Help Once a Strong Reranker Is Present?

cs.IR · 2026-06-14 · conditional · novelty 7.0

On heterogeneous document collections, only query expansion and a newly introduced per-source calibrated corrector (SSCC) deliver reliable gains beyond a strong cross-encoder reranker; other common retrieval enhancements do not.

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

cs.LG · 2026-06-09 · accept · novelty 7.0

A spiked signal-plus-noise model yields separation ratios that partition multimodal problems into four regimes where alignment, prediction, both, or neither succeed.

Trajectory Geometry of Transformer Representations Across Layers

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

Transformer representations form trajectories showing semantic convergence in middle-to-late layers, higher curvature on reasoning tasks, bifurcation on ambiguous tokens, and a consistent three-phase cosine similarity pattern across GPT-2, TinyLlama, and Qwen2.5.

X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining

cs.CV · 2026-06-07 · unverdicted · novelty 7.0

X-Tokenizer creates semantic action tokens via asymmetric residual quantization and contrastive pretraining on large trajectory data, outperforming prior methods like FAST on robotic tasks.

Efficient Mean Curvature Computation on High-Dimensional Data Manifolds

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

An exact algebraic identity plus low-rank SVD and Haar-measure null-space approximation reduce per-point mean curvature cost from O(m^4) to O(k^2 m + k m p^2) with 50-300x speedups and negligible accuracy loss.

RedZeD: Computing persistent homology by Reduction to Zero Differentials

cs.CG · 2026-06-04 · unverdicted · novelty 7.0

RedZeD presents a new theoretical framework and algorithm for faster persistent homology computation on Vietoris-Rips filtrations.

FiSeR: Fine-Grained Source Representations for Cross-Domain AI Image Detection

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

FiSeR uses coarse contrastive separation of natural vs synthetic images plus fine contrastive grouping by generator identity to improve cross-domain AUROC by +10.22 over DIRE baseline on multiple test sets.

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

LLM residual streams during addition form an Iso-Raw-Sum Trajectory anchored by digit semantics and modulated by continuous carry signals, with errors arising as geometric slippages across quantization thresholds in a noisy model.

GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

GlucoFM decomposes CGM traces into dual state-event streams, pretrains on 109k hours of unlabeled data, and reports superior subject-disjoint performance on seven clinical tasks across four cohorts.

ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

ScaleMAP is a dimensionality-reduction method that preserves both neighborhood structure and local density by scaling embedding displacements with original local radii, matching DensMAP on density while retaining UMAP-level neighborhood fidelity.

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

CoP achieves over 90% of per-instance SAM performance on cell-type benchmarks with one click per type via recursive non-parametric expansion of reliable same-type points.

Robust and Efficient Guardrails with Latent Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

COLAGUARD matches explicit-reasoning guardrail performance on safety benchmarks while delivering 12.9X speedup and 22.4X token reduction by propagating hidden states instead of generating text.

Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering

cs.LG · 2026-05-24 · unverdicted · novelty 7.0

A Riemannian geodesic framework for label-free manifold steering in language models via a schema-supervised encoder approximating output Hellinger distance on activations.

Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language

cs.CL · 2026-05-23 · unverdicted · novelty 7.0

Successor representation training on natural language causes part-of-speech categories to emerge spontaneously in the learned embeddings, with structure varying by predictive horizon.

Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank

cs.HC · 2026-05-23 · unverdicted · novelty 7.0

A 527-item GDPR-aligned privacy preference item bank was developed by extracting 669 statements from 99 GDPR articles and validating them through multi-round expert consensus and semantic clustering.

citing papers explorer

Showing 16 of 16 citing papers after filters.

Physics-informed, Generative Adversarial Design of Funicular Shells cs.CE · 2026-04-17 · unverdicted · none · ref 45 · internal anchor
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning cs.CR · 2026-03-31 · unverdicted · none · ref 47 · internal anchor
SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data q-bio.GN · 2026-03-25 · unverdicted · none · ref 111 · internal anchor
A large benchmark finds traditional imputation methods for scRNA-seq data generally outperform deep learning ones, but numerical recovery does not reliably improve biological downstream analyses and no method wins across all settings.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 43 · internal anchor
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design cs.LG · 2026-04-15 · unverdicted · none · ref 40 · internal anchor
LEGO-MOF maps MOF linkers to an equivariant latent space for continuous editing and uses test-time optimization to achieve a 147.5% average boost in pure CO2 uptake while preserving structural validity.
TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection cs.AI · 2026-04-12 · unverdicted · none · ref 28 · internal anchor
TrajOnco uses a chain-of-agents LLM architecture with memory to perform temporal reasoning on longitudinal EHR, achieving 0.64-0.80 AUROC for 1-year multi-cancer risk prediction in zero-shot mode on matched cohorts while matching supervised ML on lung cancer and outperforming single-agent baselines.
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs cs.LG · 2026-04-10 · unverdicted · none · ref 64 · internal anchor
DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.
Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments cs.LG · 2026-04-02 · unverdicted · none · ref 39 · internal anchor
AMC models memory consolidation via a Liquid-Glass-Crystal process governed by an SDE with proven convergence to a Beta distribution, yielding 34-43% better forward transfer and 67-80% less forgetting on standard continual RL benchmarks.
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024-06-25 · unverdicted · none · ref 70 · internal anchor
FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.
A foundation model for atomistic materials chemistry physics.chem-ph · 2023-12-29 · unverdicted · none · ref 42 · internal anchor
MACE-MP-0 is a general-purpose atomistic ML force field trained on public data that enables stable simulations of diverse chemical systems with qualitative and sometimes quantitative accuracy, serving as a starting point for fine-tuning.
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022-08-23 · accept · none · ref 38 · internal anchor
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection cs.CV · 2026-05-12 · unverdicted · none · ref 29 · internal anchor
CAST selects better multimodal coresets by fusing collapse-aware topologies across modalities and matching distributions at multiple scales in the diffusion wavelet domain.
Collaboration, Integration, and Thematic Exploration in European Framework Programmes: A Longitudinal Network Analysis physics.soc-ph · 2026-04-13 · unverdicted · none · ref 34 · internal anchor
EU Framework Programmes have increased participation equity and integrated new countries through collaboration, yet research remains concentrated on established trajectories rather than broadly exploratory.
FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 2 · 2 links · internal anchor
FastUMAP approximates UMAP via sparse bipartite point-landmark graphs and Nystrom initialization to deliver lower runtimes than Barnes-Hut t-SNE on most tested datasets while retaining competitive kNN accuracy.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation cs.CV · 2026-04-29 · unverdicted · none · ref 81 · internal anchor
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation cs.IR · 2026-04-23 · unverdicted · none · ref 25 · internal anchor
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer