hub Canonical reference

2016.280

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei · 2009 · DOI 10.1109/cvpr

Canonical reference. 82% of citing Pith papers cite this work as background.

50 Pith papers citing it

Background 82% of classified citations

open at publisher browse 50 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 8 dataset 2 baseline 1

citation-polarity summary

background 9 baseline 1 use dataset 1

representative citing papers

Rolling Shutter Relative Pose Estimation Made Practical

cs.CV · 2026-06-25 · conditional · novelty 8.0

A linearized solver estimates rolling-shutter relative pose and motion from 7 affine correspondences in 1.2 ms and reports best-in-benchmark accuracy plus usable translational velocity.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.

A Systematic Benchmark of Intraoperative Ultrasound-to-MR Synthesis for Brain Tumour Surgery

cs.CV · 2026-05-30 · conditional · novelty 7.0

On the public ReMIND dataset, a systematic benchmark of six synthesis models across 48 experiments finds LPIPS correlates with downstream segmentation utility while SSIM does not, with SynDiff-2.5D performing best.

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.

Projection-Free Transformers via Gaussian Kernel Attention

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

Gaussian Kernel Attention replaces learned QKV projections with a Gaussian RBF kernel on per-head token features, using 0.42x parameters and 0.49x FLOPs while showing competitive language modeling performance at depth 20.

Differentially Private Contrastive Learning via Bounding Group-level Contribution

cs.CR · 2026-04-29 · unverdicted · novelty 7.0

DP-GCL improves differentially private contrastive learning by bounding group-level contributions through batch partitioning and intra-group augmentation, delivering 5.6% higher image classification accuracy and 20.1% higher retrieval accuracy than existing approaches.

AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe

cs.MM · 2026-04-22 · unverdicted · novelty 7.0

AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.

Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.

Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

cs.HC · 2026-03-13 · unverdicted · novelty 7.0

A system combining VLM landmark instructions with real-time corrective spatial audio reduces route deviations in a small user study compared to VLM-only and Google Maps audio baselines.

MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection

cs.CV · 2026-03-02 · unverdicted · novelty 7.0

MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.

Accelerating Inference for Multilayer Neural Networks with Quantum Computers

quant-ph · 2025-10-08 · unverdicted · novelty 7.0

Quantum circuits for coherent multilayer neural network inference achieve quadratic to polylogarithmic speedups over classical methods depending on quantum data access models for inputs and weights.

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

cs.CV · 2025-09-26 · unverdicted · novelty 7.0

MultiMat shows multimodal large models plus constrained search produce higher-quality procedural material graphs than text-only baselines on a new production dataset.

Flowing With Purpose: Latent Action Guided Flow Matching Policies For Robotic Manipulation

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

LAFM adapts the source distribution in flow matching policies via a latent action model to better match fragmented robotic action spaces, claiming 23.4% higher real-world success and 10.4% on LIBERO-90 while beating larger pre-trained models.

Radial Basis Function Networks as Projection Heads in Self-Supervised Learning

cs.CV · 2026-06-19 · unverdicted · novelty 6.0

RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.

FATE: Pillar Encoding and Frequency-Aware Training for Event-Based Object Detection

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

FATE combines pillar encoding via orthogonal polynomial basis with frequency-aware training to enable event-based object detection at up to 200 Hz without internal temporal sub-binning.

Jaguar: Fast Private CNN Inference with Power-of-Two Homomorphic Arithmetic

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Jaguar replaces prime-modulus HE with power-of-two arithmetic to enable coefficient-domain convolution and local-shift truncation, reporting 2-3.7x lower latency than Cheetah and Rhombus on ResNet-18/50 and MobileNetV2.

Model Merging: Foundations and Algorithms

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.

Neighbor2Inverse: Self-Supervised Denoising for Low-Dose Region-of-Interest Phase Contrast CT

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

Neighbor2Inverse adapts the Neighbor2Neighbor principle to train a denoising network directly in the image domain for low-dose PBI-CT by using independently noised subsampled projections.

Remote SAMsing: From Segment Anything to Segment Everything

cs.CV · 2026-04-30 · conditional · novelty 6.0

Remote SAMsing pipeline boosts SAM2 coverage on remote sensing scenes from 30-68% to 91-98% via multi-pass masking and boundary-aware merging while preserving mask quality.

Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms

cs.CR · 2026-04-28 · unverdicted · novelty 6.0

A threat-oriented digital twinning methodology and open-source modular twin is introduced for security evaluation of autonomous platforms, translating threat analysis into controllable tests for spoofing, replay, and adversarial ML attacks.

Where are they looking in the operating room?

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.

Geometric Correction of Side-Scan Sonar Images with Image-Consistent Attitude Refinement

physics.ao-ph · 2026-04-21 · unverdicted · novelty 6.0

A geometric correction technique for side-scan sonar images that refines yaw-pitch attitude by fusing navigation baselines with image-inferred perturbations separated via port-starboard symmetry.

Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

An automated Python simulator, calibrated to one experimental run, generates consistent time-series data for many batch distillation scenarios including anomalies, forming an openly released hybrid dataset for deep anomaly detection.

citing papers explorer

Showing 50 of 50 citing papers.

Rolling Shutter Relative Pose Estimation Made Practical cs.CV · 2026-06-25 · conditional · none · ref 31
A linearized solver estimates rolling-shutter relative pose and motion from 7 affine correspondences in 1.2 ms and reports best-in-benchmark accuracy plus usable translational velocity.
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking cs.CV · 2026-06-21 · conditional · none · ref 23
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory cs.CV · 2026-06-03 · unverdicted · none · ref 20
WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.
A Systematic Benchmark of Intraoperative Ultrasound-to-MR Synthesis for Brain Tumour Surgery cs.CV · 2026-05-30 · conditional · none · ref 57
On the public ReMIND dataset, a systematic benchmark of six synthesis models across 48 experiments finds LPIPS correlates with downstream segmentation utility while SSIM does not, with SynDiff-2.5D performing best.
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception cs.CV · 2026-05-11 · unverdicted · none · ref 32
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
Projection-Free Transformers via Gaussian Kernel Attention cs.LG · 2026-05-04 · unverdicted · none · ref 2
Gaussian Kernel Attention replaces learned QKV projections with a Gaussian RBF kernel on per-head token features, using 0.42x parameters and 0.49x FLOPs while showing competitive language modeling performance at depth 20.
Differentially Private Contrastive Learning via Bounding Group-level Contribution cs.CR · 2026-04-29 · unverdicted · none · ref 8
DP-GCL improves differentially private contrastive learning by bounding group-level contributions through batch partitioning and intra-group augmentation, delivering 5.6% higher image classification accuracy and 20.1% higher retrieval accuracy than existing approaches.
AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe cs.MM · 2026-04-22 · unverdicted · none · ref 36
AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 10
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
Navig-AI-tion: Navigation by Contextual AI and Spatial Audio cs.HC · 2026-03-13 · unverdicted · none · ref 4
A system combining VLM landmark instructions with real-time corrective spatial audio reduces route deviations in a small user study compared to VLM-only and Google Maps audio baselines.
MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection cs.CV · 2026-03-02 · unverdicted · none · ref 17
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
Accelerating Inference for Multilayer Neural Networks with Quantum Computers quant-ph · 2025-10-08 · unverdicted · none · ref 3
Quantum circuits for coherent multilayer neural network inference achieve quadratic to polylogarithmic speedups over classical methods depending on quantum data access models for inputs and weights.
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models cs.CV · 2025-09-26 · unverdicted · none · ref 24
MultiMat shows multimodal large models plus constrained search produce higher-quality procedural material graphs than text-only baselines on a new production dataset.
Flowing With Purpose: Latent Action Guided Flow Matching Policies For Robotic Manipulation cs.RO · 2026-06-22 · unverdicted · none · ref 37
LAFM adapts the source distribution in flow matching policies via a latent action model to better match fragmented robotic action spaces, claiming 23.4% higher real-world success and 10.4% on LIBERO-90 while beating larger pre-trained models.
Radial Basis Function Networks as Projection Heads in Self-Supervised Learning cs.CV · 2026-06-19 · unverdicted · none · ref 17
RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.
FATE: Pillar Encoding and Frequency-Aware Training for Event-Based Object Detection cs.CV · 2026-06-15 · unverdicted · none · ref 54
FATE combines pillar encoding via orthogonal polynomial basis with frequency-aware training to enable event-based object detection at up to 200 Hz without internal temporal sub-binning.
Jaguar: Fast Private CNN Inference with Power-of-Two Homomorphic Arithmetic cs.CR · 2026-06-10 · unverdicted · none · ref 4
Jaguar replaces prime-modulus HE with power-of-two arithmetic to enable coefficient-domain convolution and local-shift truncation, reporting 2-3.7x lower latency than Cheetah and Rhombus on ResNet-18/50 and MobileNetV2.
Model Merging: Foundations and Algorithms cs.LG · 2026-05-02 · unverdicted · none · ref 200
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
Neighbor2Inverse: Self-Supervised Denoising for Low-Dose Region-of-Interest Phase Contrast CT cs.CV · 2026-05-01 · unverdicted · none · ref 11
Neighbor2Inverse adapts the Neighbor2Neighbor principle to train a denoising network directly in the image domain for low-dose PBI-CT by using independently noised subsampled projections.
Remote SAMsing: From Segment Anything to Segment Everything cs.CV · 2026-04-30 · conditional · none · ref 17
Remote SAMsing pipeline boosts SAM2 coverage on remote sensing scenes from 30-68% to 91-98% via multi-pass masking and boundary-aware merging while preserving mask quality.
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms cs.CR · 2026-04-28 · unverdicted · none · ref 14
A threat-oriented digital twinning methodology and open-source modular twin is introduced for security evaluation of autonomous platforms, translating threat analysis into controllable tests for spoofing, replay, and adversarial ML attacks.
Where are they looking in the operating room? cs.CV · 2026-04-22 · unverdicted · none · ref 43
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
Geometric Correction of Side-Scan Sonar Images with Image-Consistent Attitude Refinement physics.ao-ph · 2026-04-21 · unverdicted · none · ref 27
A geometric correction technique for side-scan sonar images that refines yaw-pitch attitude by fusing navigation baselines with image-inferred perturbations separated via port-starboard symmetry.
Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection cs.LG · 2026-04-10 · unverdicted · none · ref 46
An automated Python simulator, calibrated to one experimental run, generates consistent time-series data for many batch distillation scenarios including anomalies, forming an openly released hybrid dataset for deep anomaly detection.
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification stat.ML · 2026-04-07 · unverdicted · none · ref 65
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing cs.CV · 2026-04-03 · unverdicted · none · ref 25
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information cs.CV · 2025-12-29 · unverdicted · none · ref 41
Holi-DETR improves fashion item detection by integrating co-occurrence probabilities, inter-item spatial arrangements, and body keypoint relationships into the DETR architecture.
Graph Signal Denoising Using Regularization by Denoising and Its Parameter Estimation eess.SP · 2025-12-16 · unverdicted · none · ref 35
RED is adapted to graph signals with deep unrolling for parameter estimation, yielding lower MSE than prior graph denoising methods on synthetic and real data.
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA cs.CV · 2025-09-12 · unverdicted · none · ref 52
LaV-CoT introduces a multi-stage visual CoT pipeline and GRPO training with language-consistency rewards, delivering up to 9.5% accuracy gains on multilingual VQA benchmarks over similar-sized open models.
Near OOD Detection for Vision-Language Prompt Learning with Contrastive Logit Score cs.CV · 2024-05-25 · unverdicted · none · ref 2
Contrastive Logit Score (CLS) improves near OOD detection AUROC by up to 11.67% for pre-trained vision-language prompt learning methods as a plug-and-play post-hoc function.
SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks cs.CV · 2026-06-23 · unverdicted · none · ref 1
The paper releases SignNet-1M, a 1M-scale augmented dataset for ASL, CSL and DGS with 3DGS and diffusion-based variations, plus benchmarks showing improved cross-shift generalization.
Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging cs.CV · 2026-06-12 · unverdicted · none · ref 21
Neuro-JEPA is a sparse multimodal foundation model pretrained on 1,551,862 brain MRI scans that shows stronger and more consistent performance than existing models and CNN baselines across 47 tasks from clinical and public datasets.
Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration cs.SD · 2026-06-08 · unverdicted · none · ref 24
MAP-Elites with CPPNs, DSP graphs, and a deep classifier produces diverse synthetic sounds across durations and musical/non-musical contexts.
Trustworthy Visual Predicates for Robust Manipulation Understanding under Degradation cs.CV · 2026-06-06 · unverdicted · none · ref 22
Introduces a structured framework showing that visual predicate failures under degradation are non-uniform, with static predicates more robust than dynamic ones like grasp and release, and quantifies downstream accuracy drops.
Efficient 3D Content Reconstruction and Generation cs.CV · 2026-05-18 · unverdicted · none · ref 277
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting cs.CV · 2026-04-27 · unverdicted · none · ref 1
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks cs.AI · 2026-03-12 · unverdicted · none · ref 77
Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.
ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient cs.CV · 2025-05-27 · unverdicted · none · ref 44
ProBA replaces rigid point tracks with a probabilistic pose graph and 3D Gaussian landmarks, optimizing via negative log-likelihood with the Bhattacharyya coefficient to expand the basin of attraction in prior-free SfM.
Neuron ranking -- an informed way to condense convolutional neural networks architecture cs.LG · 2019-07-03 · unverdicted · none · ref 9
Shapley value and variational importance switch methods produce consistent rankings of filter importance in CNNs, enabling compression and interpretability.
SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures cs.AR · 2026-06-26 · unverdicted · none · ref 9
SEADA introduces an analytical framework combining cost models, mapping tools, and entropy-based precision selection to optimize mixed-precision DNNs on multi-precision spatial architectures.
A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting cs.AI · 2026-06-02 · unverdicted · none · ref 10
A learned linear activation bridge achieves high alignment (cosine ~0.97) between Pythia-160M and Pythia-410M states but produces no improvement in downstream multi-hop answering when injected into the receiver.
Improving acoustic drone detection generalization through pretraining and data augmentation eess.AS · 2026-05-29 · unverdicted · none · ref 15
Pretraining on broad sound events plus on-the-fly augmentations improves out-of-domain true-positive rates for acoustic drone detection at fixed low false-positive rates.
INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference cs.LG · 2026-05-13 · unverdicted · none · ref 16
INAR-VL routes 36% of visual question answering requests to the edge using lightweight complexity signals, cutting latency 24% and energy 26% while retaining 97% of cloud accuracy.
Particle Diffusion Matching: Random Walk Correspondence Search for the Alignment of Standard and Ultra-Widefield Fundus Images cs.CV · 2026-04-11 · unverdicted · none · ref 91
Particle Diffusion Matching uses diffusion-guided random walk searches to align challenging standard and ultra-widefield retinal images, claiming state-of-the-art benchmark performance.
Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation cs.CV · 2026-02-08 · unverdicted · none · ref 81
MeCSAFNet reports mIoU gains of 4.8-19.6% over U-Net and SegFormer baselines on FBP and Potsdam datasets by processing spectral channels separately and fusing features with CBAM attention.
TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving cs.CV · 2024-03-25 · unverdicted · none · ref 19
TwinLiteNet+ is a hybrid-encoder multi-task segmentation model with new UCB, USB, and PCAA modules that reports 92.9% mIoU on drivable area and 34.2% IoU on lane segmentation on BDD100K while using 11x fewer FLOPs than prior models.
Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation cs.RO · 2026-06-01 · unverdicted · none · ref 23
HSAN integrates hierarchical semantic graphs, optimal transport-based goal selection, and graph-aware RL to claim SOTA results on VLN-CE tasks.
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding cs.CV · 2025-08-28 · unverdicted · none · ref 217
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging cs.CV · 2026-02-04 · unreviewed · ref 30 · 2 links
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting cs.CE · 2025-02-26 · unreviewed · ref 14

2016.280

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer