hub

Objects as Points

· 2019 · cs.CV · arXiv 1904.07850

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

open full Pith review browse 25 citing papers arXiv PDF

abstract

Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 baseline 1 dataset 1 method 1

citation-polarity summary

background 1 baseline 1 use dataset 1 use method 1

representative citing papers

Towards UAV Detection in the Real World: A New Multispectral Dataset UAVNet-MS and a New Method

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Presents the first multispectral dataset for fine-grained small-UAV detection and a dual-stream MFDNet baseline that gains 6.2% AP50 over RGB-only detectors by using spectral material cues.

MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

cs.CV · 2026-03-20 · unverdicted · novelty 7.0

MoCA3D formulates monocular 3D box prediction as dense pixel-space tasks using corner heatmaps and depth maps, with a new PAG metric for image-plane evaluation.

Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning

eess.SP · 2026-05-06 · unverdicted · novelty 7.0

Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.

Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.

FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

FishRoPE reparameterizes attention mechanisms in fisheye images to use angular separation in spherical coordinates, enabling frozen vision foundation models to achieve state-of-the-art results on 2D detection and BEV segmentation benchmarks.

DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset across weather conditions.

WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.

From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes

cs.CV · 2026-03-03 · unverdicted · novelty 6.0

L2G-Det detects and segments novel object instances in open scenes by using local template patch matches to generate points that prompt an augmented SAM for global masks.

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

cs.CV · 2026-02-04 · unverdicted · novelty 6.0

PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

cs.CV · 2025-12-03 · unverdicted · novelty 6.0

ThinkDeeper introduces a world-model-based reasoning step that predicts future spatial states to improve multimodal visual grounding for autonomous vehicles, achieving top results on Talk2Car and other benchmarks.

Multi-needle Localization for Pelvic Seed Implant Brachytherapy based on Tip-handle Detection and Matching

cs.CV · 2025-09-22 · unverdicted · novelty 6.0

A tip-handle detection network based on HRNet combined with greedy matching outperforms nnUNet segmentation for multi-needle localization in pelvic brachytherapy CT images on a 100-patient dataset.

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

cs.CV · 2021-12-22 · conditional · novelty 6.0

BEVDet achieves 39.3% mAP and 47.2% NDS on nuScenes val set with a fast BEV-based multi-camera 3D detector that outperforms FCOS3D while using far less compute in its tiny variant.

Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.

Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

RefCD enables unsupervised category-aware object detection by using feature similarity between predicted objects and unlabeled reference images to guide category learning.

Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.

SFFNet: Synergistic Feature Fusion Network With Dual-Domain Edge Enhancement for UAV Image Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

SFFNet uses multi-scale dynamic dual-domain coupling and a synergistic feature pyramid network to reach 36.8 AP on VisDrone and 20.6 AP on UAVDT for UAV object detection.

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

cs.CV · 2023-03-09 · accept · novelty 6.0

Grounding DINO fuses language and vision via feature enhancer, language-guided query selection, and cross-modality decoder in a DINO backbone, achieving 52.5 AP zero-shot on COCO and a new record of 26.1 AP mean on ODinW.

YOLOX: Exceeding YOLO Series in 2021

cs.CV · 2021-07-18 · accept · novelty 6.0

YOLOX exceeds prior YOLO models by adopting anchor-free detection, decoupled heads, and SimOTA assignment to reach 50.0% AP on COCO for the large variant.

Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

A vision-language model outputs dual heatmaps for navigation affordance and facing to ground semantic instructions into executable free space, achieving higher affordance rates than waypoint regression across simulated robot embodiments.

Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

TCMP achieves SOTA MOT metrics (HOTA 63.4%, IDF1 65.0%, AssA 49.1%) with 0.014x parameters and 0.05x FLOPs of the previous best method by using a simple dilated TCN regressor.

Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection

cs.CV · 2026-04-26 · unverdicted · novelty 5.0

Caries-DETR adds tooth-structure query initialization and lesion-aware loss reweighting to DETR, reaching state-of-the-art caries detection on AlphaDent and DentalAI datasets.

Class-Adaptive Cooperative Perception for Multi-Class LiDAR-based 3D Object Detection in V2X Systems

cs.CV · 2026-04-11 · unverdicted · novelty 5.0

A new class-adaptive fusion architecture improves multi-class LiDAR 3D object detection in V2X cooperative perception by routing small and large objects through attentive pathways and balancing training objectives.

Scene Reconstruction as Mapping Priors for 3D Detection

cs.CV · 2026-05-21 · unverdicted · novelty 4.0

Automatically constructed mapping priors from sensor aggregation are integrated via the MPA3D framework to achieve state-of-the-art 3D detection results on the Waymo Open Dataset.

Adapted Center and Scale Prediction: More Stable and More Accurate

cs.CV · 2020-02-20 · unverdicted · novelty 4.0

Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.

citing papers explorer

Showing 25 of 25 citing papers.

Towards UAV Detection in the Real World: A New Multispectral Dataset UAVNet-MS and a New Method cs.CV · 2026-05-20 · unverdicted · none · ref 31 · internal anchor
Presents the first multispectral dataset for fine-grained small-UAV detection and a dual-stream MFDNet baseline that gains 6.2% AP50 over RGB-only detectors by using spectral material cues.
MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane cs.CV · 2026-03-20 · unverdicted · none · ref 58 · internal anchor
MoCA3D formulates monocular 3D box prediction as dense pixel-space tasks using corner heatmaps and depth maps, with a new PAG metric for image-plane evaluation.
Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning eess.SP · 2026-05-06 · unverdicted · none · ref 15
Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.
Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes cs.CV · 2026-04-20 · unverdicted · none · ref 65
SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.
FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception cs.CV · 2026-04-12 · unverdicted · none · ref 30
FishRoPE reparameterizes attention mechanisms in fisheye images to use angular separation in spherical coordinates, enabling frozen vision foundation models to achieve state-of-the-art results on 2D detection and BEV segmentation benchmarks.
DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather cs.CV · 2026-04-09 · unverdicted · none · ref 53
DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset across weather conditions.
WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects cs.CV · 2026-04-09 · unverdicted · none · ref 51
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes cs.CV · 2026-03-03 · unverdicted · none · ref 51 · internal anchor
L2G-Det detects and segments novel object instances in open scenes by using local template patch matches to generate points that prompt an augmented SAM for global masks.
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization cs.CV · 2026-02-04 · unverdicted · none · ref 64 · internal anchor
PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles cs.CV · 2025-12-03 · unverdicted · none · ref 70 · internal anchor
ThinkDeeper introduces a world-model-based reasoning step that predicts future spatial states to improve multimodal visual grounding for autonomous vehicles, achieving top results on Talk2Car and other benchmarks.
Multi-needle Localization for Pelvic Seed Implant Brachytherapy based on Tip-handle Detection and Matching cs.CV · 2025-09-22 · unverdicted · none · ref 44 · internal anchor
A tip-handle detection network based on HRNet combined with greedy matching outperforms nnUNet segmentation for multi-needle localization in pelvic brachytherapy CT images on a 100-patient dataset.
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View cs.CV · 2021-12-22 · conditional · none · ref 59 · internal anchor
BEVDet achieves 39.3% mAP and 47.2% NDS on nuScenes val set with a fast BEV-based multi-camera 3D detector that outperforms FCOS3D while using far less compute in its tiny variant.
Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours cs.CV · 2026-05-09 · unverdicted · none · ref 32
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness cs.CV · 2026-05-06 · unverdicted · none · ref 12
RefCD enables unsupervised category-aware object detection by using feature similarity between predicted objects and unlabeled reference images to guide category learning.
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection cs.CV · 2026-04-07 · unverdicted · none · ref 59
Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
SFFNet: Synergistic Feature Fusion Network With Dual-Domain Edge Enhancement for UAV Image Object Detection cs.CV · 2026-04-03 · unverdicted · none · ref 64
SFFNet uses multi-scale dynamic dual-domain coupling and a synergistic feature pyramid network to reach 36.8 AP on VisDrone and 20.6 AP on UAVDT for UAV object detection.
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection cs.CV · 2023-03-09 · accept · none · ref 63
Grounding DINO fuses language and vision via feature enhancer, language-guided query selection, and cross-modality decoder in a DINO backbone, achieving 52.5 AP zero-shot on COCO and a new record of 26.1 AP mean on ODinW.
YOLOX: Exceeding YOLO Series in 2021 cs.CV · 2021-07-18 · accept · none · ref 40
YOLOX exceeds prior YOLO models by adopting anchor-free detection, decoupled heads, and SimOTA assignment to reach 50.0% AP on COCO for the large variant.
Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation cs.RO · 2026-05-19 · unverdicted · none · ref 32 · internal anchor
A vision-language model outputs dual heatmaps for navigation affordance and facing to ground semantic instructions into executable free space, achieving higher affordance rates than waypoint regression across simulated robot embodiments.
Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking cs.CV · 2026-05-01 · unverdicted · none · ref 17
TCMP achieves SOTA MOT metrics (HOTA 63.4%, IDF1 65.0%, AssA 49.1%) with 0.014x parameters and 0.05x FLOPs of the previous best method by using a simple dilated TCN regressor.
Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection cs.CV · 2026-04-26 · unverdicted · none · ref 43
Caries-DETR adds tooth-structure query initialization and lesion-aware loss reweighting to DETR, reaching state-of-the-art caries detection on AlphaDent and DentalAI datasets.
Class-Adaptive Cooperative Perception for Multi-Class LiDAR-based 3D Object Detection in V2X Systems cs.CV · 2026-04-11 · unverdicted · none · ref 31
A new class-adaptive fusion architecture improves multi-class LiDAR 3D object detection in V2X cooperative perception by routing small and large objects through attentive pathways and balancing training objectives.
Scene Reconstruction as Mapping Priors for 3D Detection cs.CV · 2026-05-21 · unverdicted · none · ref 66 · internal anchor
Automatically constructed mapping priors from sensor aggregation are integrated via the MPA3D framework to achieve state-of-the-art 3D detection results on the Waymo Open Dataset.
Adapted Center and Scale Prediction: More Stable and More Accurate cs.CV · 2020-02-20 · unverdicted · none · ref 55 · internal anchor
Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
Frozen Vision Transformers for Dense Prediction on Small Datasets: A Case Study in Arrow Localization cs.CV · 2026-04-18 · conditional · none · ref 1
A frozen DINOv3 ViT-L/16 with AnyUp upsampling and lightweight CenterNet heads achieves 0.893 F1 and 1.41 mm localization error on arrow punctures using 48 training images.

Objects as Points

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer