Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
hub
Reid, and Silvio Savarese
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 16representative citing papers
Gaussian Kernel Attention replaces learned QKV projections with a Gaussian RBF kernel on per-head token features, using 0.42x parameters and 0.49x FLOPs while showing competitive language modeling performance at depth 20.
DP-GCL improves differentially private contrastive learning by bounding group-level contributions through batch partitioning and intra-group augmentation, delivering 5.6% higher image classification accuracy and 20.1% higher retrieval accuracy than existing approaches.
AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
Neighbor2Inverse adapts the Neighbor2Neighbor principle to train a denoising network directly in the image domain for low-dose PBI-CT by using independently noised subsampled projections.
Remote SAMsing pipeline boosts SAM2 coverage on remote sensing scenes from 30-68% to 91-98% via multi-pass masking and boundary-aware merging while preserving mask quality.
A threat-oriented digital twinning methodology and open-source modular twin is introduced for security evaluation of autonomous platforms, translating threat analysis into controllable tests for spoofing, replay, and adversarial ML attacks.
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
A geometric correction technique for side-scan sonar images that refines yaw-pitch attitude by fusing navigation baselines with image-inferred perturbations separated via port-starboard symmetry.
An automated Python simulator, calibrated to one experimental run, generates consistent time-series data for many batch distillation scenarios including anomalies, forming an openly released hybrid dataset for deep anomaly detection.
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
Particle Diffusion Matching uses diffusion-guided random walk searches to align challenging standard and ultra-widefield retinal images, claiming state-of-the-art benchmark performance.
citing papers explorer
-
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
-
Projection-Free Transformers via Gaussian Kernel Attention
Gaussian Kernel Attention replaces learned QKV projections with a Gaussian RBF kernel on per-head token features, using 0.42x parameters and 0.49x FLOPs while showing competitive language modeling performance at depth 20.
-
Differentially Private Contrastive Learning via Bounding Group-level Contribution
DP-GCL improves differentially private contrastive learning by bounding group-level contributions through batch partitioning and intra-group augmentation, delivering 5.6% higher image classification accuracy and 20.1% higher retrieval accuracy than existing approaches.
-
AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
-
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
-
Model Merging: Foundations and Algorithms
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
-
Neighbor2Inverse: Self-Supervised Denoising for Low-Dose Region-of-Interest Phase Contrast CT
Neighbor2Inverse adapts the Neighbor2Neighbor principle to train a denoising network directly in the image domain for low-dose PBI-CT by using independently noised subsampled projections.
-
Remote SAMsing: From Segment Anything to Segment Everything
Remote SAMsing pipeline boosts SAM2 coverage on remote sensing scenes from 30-68% to 91-98% via multi-pass masking and boundary-aware merging while preserving mask quality.
-
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms
A threat-oriented digital twinning methodology and open-source modular twin is introduced for security evaluation of autonomous platforms, translating threat analysis into controllable tests for spoofing, replay, and adversarial ML attacks.
-
Where are they looking in the operating room?
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
-
Geometric Correction of Side-Scan Sonar Images with Image-Consistent Attitude Refinement
A geometric correction technique for side-scan sonar images that refines yaw-pitch attitude by fusing navigation baselines with image-inferred perturbations separated via port-starboard symmetry.
-
Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection
An automated Python simulator, calibrated to one experimental run, generates consistent time-series data for many batch distillation scenarios including anomalies, forming an openly released hybrid dataset for deep anomaly detection.
-
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
-
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
-
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
-
Particle Diffusion Matching: Random Walk Correspondence Search for the Alignment of Standard and Ultra-Widefield Fundus Images
Particle Diffusion Matching uses diffusion-guided random walk searches to align challenging standard and ultra-widefield retinal images, claiming state-of-the-art benchmark performance.