hub Mixed citations

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai · 2020 · cs.CV · arXiv 2010.04159

Mixed citation behavior. Most common role is background (62%).

84 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 84 citing papers arXiv PDF

abstract

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is released at https://github.com/fundamentalvision/Deformable-DETR.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 2 baseline 1

citation-polarity summary

background 5 use method 2 baseline 1

claims ledger

abstract DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive e

co-cited works

representative citing papers

GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

GaussianFusion presents a 3D Gaussian-based framework that unifies multi-modal features in continuous space for 3D object detection and semantic occupancy, reporting gains over BEVFusion and GaussFormer on nuScenes.

MVDGC: Joint 3D and 2D Multi-view Pedestrian Detection via Dual Geometric Constraints

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

MVDGC unifies BEV and image-view pedestrian localization into one task via 3D cylindrical queries that enforce dual geometric constraints between views.

Fusing Satellite Imagery and Planimetric Maps for Cross-View Localization

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

A fusion module for satellite imagery and planimetric maps reduces mean localization error by 30.13% over single-modality state-of-the-art methods in cross-view tasks.

FlowOVD: Learning Generative Latent Flows for Zero-shot Open-vocabulary Detection

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

FlowOVD applies rectified flow to generate continuous latent query dynamics for text-conditioned open-vocabulary detection, reporting 49.5 AP on COCO and 31.5 AP on LVIS.

Train the Agent, Not the Expert: Learning to Harness Heterogeneous Experts for Multi-Turn Visual Reasoning

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

VisHarness learns a reinforcement-learned policy to harness specialized visual experts via multi-turn interactions and dynamic visual memory archiving, outperforming general models on four visual reasoning benchmarks.

Towards UAV Detection in the Real World: A New Multispectral Dataset UAVNet-MS and a New Method

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Presents the first multispectral dataset for fine-grained small-UAV detection and a dual-stream MFDNet baseline that gains 6.2% AP50 over RGB-only detectors by using spectral material cues.

Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

UniTopo unifies lane detection and topology reasoning into a single perception model, outperforming prior methods on OpenLane-V2 benchmarks with TOP_ll scores of 30.1% and 31.8%.

InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery

cs.CV · 2026-05-06 · conditional · novelty 7.0 · 2 refs

InterMesh explicitly incorporates human-object interaction semantics into multi-person mesh recovery via a detector and two lightweight modules, delivering up to 9.9% MPJPE reduction on interaction-heavy datasets.

ReLeaf: Benchmarking Leaf Segmentation across Domains and Species

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

A YOLO26 model trained on four leaf segmentation datasets reaches 83.9% mean mAP50-95 on their test sets but only 40.2% on a new 23-species benchmark, revealing substantial cross-domain generalization gaps.

Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

ConFusion reaches 59.1 mAP and 65.6 NDS on nuScenes validation by combining heterogeneous queries with QMix cross-attention and QSwap feature exchange.

Chatting about Upper-Body Expressive Human Pose and Shape Estimation

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

CoEvoer is a new cross-dependency transformer framework for upper-body expressive human pose and shape estimation that achieves state-of-the-art performance by enabling mutual enhancement between body parts.

Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

HELP uses heatmap-guided positional embeddings and a gradient mask to suppress background noise in queries, enabling efficient small-object detection with fewer decoder layers and parameters.

SynthPID: P&ID digitization from Topology-Preserving Synthetic Data

cs.CV · 2026-04-15 · conditional · novelty 7.0

Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.

Online Reasoning Video Object Segmentation

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.

YUV20K: A Complexity-Driven Benchmark and Trajectory-Aware Alignment Model for Video Camouflaged Object Detection

cs.CV · 2026-04-11 · unverdicted · novelty 7.0

YUV20K is a complexity-driven VCOD benchmark with 24k annotated frames, paired with a model using Motion Feature Stabilization via semantic primitives and Trajectory-Aware Alignment via deformable sampling that outperforms prior methods.

DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset across weather conditions.

Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Bridge-STG decouples spatio-temporal alignment via semantic bridging and query-guided localization modules to achieve state-of-the-art m_vIoU of 34.3 on VidSTG among MLLM methods.

WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.

MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

cs.CV · 2026-03-20 · unverdicted · novelty 7.0

MoCA3D formulates monocular 3D box prediction as dense pixel-space tasks using corner heatmaps and depth maps, with a new PAG metric for image-plane evaluation.

SAM 3: Segment Anything with Concepts

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

cs.NE · 2024-03-25 · conditional · novelty 7.0

A hierarchical spiking transformer using Q-K attention achieves 85.65% top-1 accuracy on ImageNet-1K, the first direct-trained SNN to exceed 85%.

Real-Time Source-Free Object Detection

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

RT-SFOD adapts dual-head detectors like YOLOv10 for source-free object detection via DHF pseudo-label fusion and MARD loss, delivering 1.4-3.5% mAP gains with 1.3x higher throughput and ~2x fewer parameters than prior SFOD methods.

Semantic Occupancy Prediction with Dual Range-Voxel Representation

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

DRVR uses range-view and geometry-aware voxel-view encoders plus fusion to deliver 5.4% higher mIoU and 2.1x faster inference than multi-sweep baselines on nuScenes-Occupancy from single sweeps.

Deformba: Vision State Space Model with Adaptive State Fusion

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

Deformba introduces context-adaptive state fusion to vision SSMs for better spatial augmentation and cross-stream interactions, showing strong results on 2D classification/detection/segmentation and 3D BEV perception benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

LiPS: Lightweight Panoptic Segmentation for Resource-Constrained Robotics cs.RO · 2026-04-01 · unverdicted · none · ref 11 · internal anchor
LiPS is a streamlined panoptic segmentation architecture that matches heavier models in accuracy while delivering up to 4.5x higher throughput and 6.8x lower computation on standard benchmarks.

Deformable DETR: Deformable Transformers for End-to-End Object Detection

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer