hub

MMDetection: Open mmlab detection toolbox and benchmark

· 1906 · arXiv 1906.07155

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Auto-FlexSwitch achieves efficient dynamic model merging by decomposing task vectors into sparse masks, signs, and scalars, then making the compression learnable via gating and adaptive bit selection with KNN-based retrieval.

KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.

UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

UHR-DETR delivers 2.8% higher mAP and 10x faster inference than sliding-window baselines for small object detection in UHR remote sensing imagery on a single 24GB GPU.

FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation

astro-ph.IM · 2026-04-14 · unverdicted · novelty 7.0

FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.

UniISP: A Unified ISP Framework for Both Human and Machine Vision

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.

Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

cs.CV · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.

SignDATA: Data Pipeline for Sign Language Translation

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

SignDATA provides a reproducible, config-driven preprocessing toolkit that converts heterogeneous sign language corpora into standardized pose or video outputs using interchangeable backends and privacy-aware options.

Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Granularity-aware distillation improves tree instance segmentation accuracy on real forest images by merging logits and unifying masks from fine-grained synthetic teachers despite coarse real labels.

Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating

cs.CV · 2026-04-11 · unverdicted · novelty 6.0

DualEngage fuses transformer-encoded student motion dynamics with 3D scene features via softmax-gated fusion to recognize group engagement in classroom videos, reporting 96.21% average accuracy on a university dataset.

Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

cs.CV · 2022-03-07 · conditional · novelty 6.0

DINO reaches 51.3 AP on COCO val2017 with a ResNet-50 backbone after 24 epochs, a +2.7 AP gain over the prior best DETR variant.

Portable Active Learning for Object Detection

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

PAL is a portable active learning method for object detection that uses class-specific logistic classifiers for uncertainty and image-level diversity to select annotation batches, showing better label efficiency than baselines on COCO, VOC, and BDD100K.

Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay

cs.CV · 2026-05-02 · unverdicted · novelty 5.0

Colinearity-Decay regularizer trains ViTs that maintain or improve full-precision accuracy while delivering higher accuracy after low-bit quantization on ImageNet and COCO tasks.

A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation

eess.IV · 2026-04-30 · unverdicted · novelty 5.0

A scale-robust lightweight CNN for glottis segmentation achieves 92.9% mDice at over 170 FPS with a 19 MB model size on three datasets.

Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

Bridge learns low-rank bases for front-door causal adjustment to remove spurious correlations from domain shifts and integrates the approach with vision foundation models for improved object detection generalization.

A3-FPN: Asymptotic Content-Aware Pyramid Attention Network for Dense Visual Prediction

cs.CV · 2026-04-11 · unverdicted · novelty 5.0

A3-FPN augments multi-scale representations with asymptotic global interaction and content-aware resampling, delivering gains such as 49.6 mask AP on MS COCO when paired with OneFormer and Swin-L.

Advancing Vision Transformer with Enhanced Spatial Priors

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

EVT improves Vision Transformers by using Euclidean distance decay for spatial priors and simpler grouping, achieving 86.6% top-1 accuracy on ImageNet-1k.

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

cs.CV · 2026-04-13 · unverdicted · novelty 4.0

The NTIRE 2026 CD-FSOD Challenge report details innovative methods and performance results from 19 teams on cross-domain few-shot object detection in open- and closed-source tracks.

Seed1.5-VL Technical Report

cs.CV · 2025-05-11 · unverdicted · novelty 4.0

Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

cs.LG · 2024-03-21 · accept · novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

citing papers explorer

Showing 18 of 18 citing papers after filters.

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression cs.LG · 2026-04-30 · unverdicted · none · ref 2
Auto-FlexSwitch achieves efficient dynamic model merging by decomposing task vectors into sparse masks, signs, and scalars, then making the compression learnable via gating and adaptive bit selection with KNN-based retrieval.
KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition cs.CV · 2026-04-25 · unverdicted · none · ref 51
KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.
UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery cs.CV · 2026-04-23 · unverdicted · none · ref 44
UHR-DETR delivers 2.8% higher mAP and 10x faster inference than sliding-window baselines for small object detection in UHR remote sensing imagery on a single 24GB GPU.
FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation astro-ph.IM · 2026-04-14 · unverdicted · none · ref 10
FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.
UniISP: A Unified ISP Framework for Both Human and Machine Vision cs.CV · 2026-05-08 · unverdicted · none · ref 3
UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.
Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy cs.CV · 2026-05-06 · unverdicted · none · ref 6 · 2 links
HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.
SignDATA: Data Pipeline for Sign Language Translation cs.CV · 2026-04-22 · unverdicted · none · ref 8
SignDATA provides a reproducible, config-driven preprocessing toolkit that converts heterogeneous sign language corpora into standardized pose or video outputs using interchangeable backends and privacy-aware options.
Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests cs.CV · 2026-04-15 · unverdicted · none · ref 1
Granularity-aware distillation improves tree instance segmentation accuracy on real forest images by merging logits and unifying masks from fine-grained synthetic teachers despite coarse real labels.
Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating cs.CV · 2026-04-11 · unverdicted · none · ref 4
DualEngage fuses transformer-encoded student motion dynamics with 3D scene features via softmax-gated fusion to recognize group engagement in classroom videos, reporting 96.21% average accuracy on a university dataset.
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection cs.CV · 2026-04-07 · unverdicted · none · ref 8
Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
Portable Active Learning for Object Detection cs.CV · 2026-05-11 · unverdicted · none · ref 5
PAL is a portable active learning method for object detection that uses class-specific logistic classifiers for uncertainty and image-level diversity to select annotation batches, showing better label efficiency than baselines on COCO, VOC, and BDD100K.
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay cs.CV · 2026-05-02 · unverdicted · none · ref 35
Colinearity-Decay regularizer trains ViTs that maintain or improve full-precision accuracy while delivering higher accuracy after low-bit quantization on ImageNet and COCO tasks.
A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation eess.IV · 2026-04-30 · unverdicted · none · ref 61
A scale-robust lightweight CNN for glottis segmentation achieves 92.9% mDice at over 170 FPS with a 19 MB model size on three datasets.
Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization cs.CV · 2026-04-29 · unverdicted · none · ref 7
Bridge learns low-rank bases for front-door causal adjustment to remove spurious correlations from domain shifts and integrates the approach with vision foundation models for improved object detection generalization.
A3-FPN: Asymptotic Content-Aware Pyramid Attention Network for Dense Visual Prediction cs.CV · 2026-04-11 · unverdicted · none · ref 54
A3-FPN augments multi-scale representations with asymptotic global interaction and content-aware resampling, delivering gains such as 49.6 mask AP on MS COCO when paired with OneFormer and Swin-L.
Advancing Vision Transformer with Enhanced Spatial Priors cs.CV · 2026-04-20 · unverdicted · none · ref 91
EVT improves Vision Transformers by using Euclidean distance decay for spatial priors and simpler grouping, achieving 86.6% top-1 accuracy on ImageNet-1k.
The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results cs.CV · 2026-04-13 · unverdicted · none · ref 11
The NTIRE 2026 CD-FSOD Challenge report details innovative methods and performance results from 19 teams on cross-domain few-shot object detection in open- and closed-source tracks.
Seed1.5-VL Technical Report cs.CV · 2025-05-11 · unverdicted · none · ref 14
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

MMDetection: Open mmlab detection toolbox and benchmark

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer