Example-Based Object Detection
Pith reviewed 2026-05-08 17:57 UTC · model grok-4.3
The pith
EBOD suppresses repeated false positives and negatives in open-vocabulary object detection by matching prior error examples, without retraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The EBOD framework integrates a prompt-based detector such as SAM3 with DINOv3 and LightGlue feature matching so that previous false-positive and false-negative examples can be stored and used to suppress identical errors when they reappear in new images, achieving this without any model retraining.
What carries the argument
The EBOD pipeline that matches stored error examples against new-image features via DINOv3 and LightGlue to filter SAM3 detections.
Load-bearing premise
Feature matching between stored error examples and new images can reliably identify and suppress the exact same false positives or negatives.
What would settle it
A test image containing a previously recorded false positive that the system still outputs as a detection after matching the error example.
Figures
read the original abstract
In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets. However, despite these advancements, false positives and false negatives still occur. In practical engineering applications, persistent misdetections or missed detections of the same object are unacceptable. Yet retraining the model every time such errors occur incurs substantial costs in terms of human effort, computational resources, and time. Therefore, how to leverage existing false positive and false negative samples to prevent such errors from recurring remains a highly challenging and urgent problem. To address this issue, we propose EBOD (Example-Based Object Detection), which integrates a prompt-based detector (SAM3) with robust feature matching modules (DINOv3 and LightGlue). The proposed framework effectively suppresses the repeated occurrence of false positives and false negatives by leveraging previous error examples, without requiring additional model retraining. Code is available at https://github.com/sunzx97/examples_based_object_detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes EBOD (Example-Based Object Detection), a framework that integrates the prompt-based open-vocabulary detector SAM3 with robust feature-matching modules DINOv3 and LightGlue. It claims to suppress repeated false positives and false negatives by leveraging prior error examples as references, without any model retraining or fine-tuning.
Significance. If the matching-based suppression mechanism proves reliable, the approach would offer a low-cost, training-free way to improve detection consistency in deployed systems where repeated errors on the same objects are costly. The availability of code is a positive factor for reproducibility.
major comments (2)
- [Abstract] Abstract: The central claim that the integration 'effectively suppresses the repeated occurrence of false positives and false negatives' is presented without any supporting experiments, quantitative metrics (e.g., reduction in FP/FN rate, matching precision/recall on error instances), ablation studies, or failure-mode analysis. No validation data or comparison against baselines appears.
- [Abstract] Abstract: The effectiveness hinges on the unstated details of how DINOv3+LightGlue matching identifies prior FP/FN instances and applies suppression (negative prompts, mask exclusion, or score adjustment). No similarity thresholds, handling of appearance variation (viewpoint, illumination, occlusion), or bounds on matching reliability are provided, leaving the load-bearing assumption unverified.
minor comments (1)
- [Abstract] The GitHub link is provided but no description of the repository contents, example usage, or datasets used for any internal testing is given in the manuscript.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the integration 'effectively suppresses the repeated occurrence of false positives and false negatives' is presented without any supporting experiments, quantitative metrics (e.g., reduction in FP/FN rate, matching precision/recall on error instances), ablation studies, or failure-mode analysis. No validation data or comparison against baselines appears.
Authors: We agree that the abstract, in its current form, presents the central claim at a high level without quantitative support or references to validation. The manuscript body outlines the EBOD framework but does not yet contain the requested experiments, metrics, ablations, or baseline comparisons. We will revise the abstract to remove the unsubstantiated claim of effectiveness and instead describe the intended mechanism, while adding a new experimental section with quantitative results, failure-mode analysis, and comparisons in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: The effectiveness hinges on the unstated details of how DINOv3+LightGlue matching identifies prior FP/FN instances and applies suppression (negative prompts, mask exclusion, or score adjustment). No similarity thresholds, handling of appearance variation (viewpoint, illumination, occlusion), or bounds on matching reliability are provided, leaving the load-bearing assumption unverified.
Authors: We agree that the abstract omits these implementation details. The current manuscript text does not specify similarity thresholds, robustness to appearance changes, or reliability bounds. In the revision we will expand the abstract with a concise description of the matching pipeline (DINOv3 feature extraction followed by LightGlue matching to prior error examples, followed by score adjustment or mask exclusion) and add the missing parameters and analysis to the method section. revision: yes
Circularity Check
No circularity: engineering integration of existing components with no derivations or fitted predictions
full rationale
The paper presents EBOD as a practical framework that combines the off-the-shelf prompt-based detector SAM3 with feature-matching modules DINOv3 and LightGlue. It claims this integration suppresses repeated false positives and negatives by using prior error examples, without retraining. No equations, mathematical derivations, parameter fitting, or self-citations appear in the abstract or described approach. The central claim is an empirical engineering assertion about effectiveness, not a derived result that reduces to its inputs by construction. No load-bearing steps match any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption DINOv3 and LightGlue feature matching can accurately identify and match previous false-positive and false-negative instances in new images
Reference graph
Works this paper leans on
-
[1]
Detect anything via next point prediction
Qing Jiang, Junan Huo, Xingyu Chen, Yuda Xiong, Zhaoyang Zeng, Yihao Chen, Tianhe Ren, Junzhi Yu, and Lei Zhang. Detect anything via next point prediction, 2025. URL https://arxiv.org/abs/2510.12798
-
[2]
T-rex2: Towards generic object detection via text-visual prompt synergy, 2024
Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, and Lei Zhang. T-rex2: Towards generic object detection via text-visual prompt synergy, 2024
2024
-
[3]
SAM 3: Segment Anything with Concepts
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...
work page Pith review arXiv 2025
-
[4]
Grounded sam: Assembling open-world models for diverse visual tasks, 2024
Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, and Lei Zhang. Grounded sam: Assembling open-world models for diverse visual tasks, 2024
2024
-
[5]
Few-shot semantic segmentation meets sam3,
Yi-Jen Tsai, Yen-Yu Lin, and Chien-Yao Wang. Few-shot semantic segmentation meets sam3,
-
[6]
URLhttps://arxiv.org/abs/2604.05433
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
INSID3: Training-free in-context segmentation with DINOv3
Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, and Stefan Roth. INSID3: Training-free in-context segmentation with DINOv3. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
2026
-
[8]
arXiv preprint arXiv:2305.13310 (2023)
Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, and Chunhua Shen. Matcher: Segment anything with one shot using all-purpose feature matching.arXiv preprint arXiv:2305.13310, 2023
-
[9]
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InICCV, 2023
2023
-
[10]
Omniglue: Gener- alizable feature matching with foundation model guidance
Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andre Araujo. Omniglue: Gener- alizable feature matching with foundation model guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
2024
-
[11]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...
work page Pith review arXiv 2025
-
[12]
A density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InProceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, page 226–231. AAAI Press, 1996
1996
-
[13]
SuperPoint: Self-Supervised Interest Point Detection and Description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description, 2018. URL https://arxiv.org/abs/1712.07629
work page Pith review arXiv 2018
-
[14]
You only look once: Unified, real-time object detection
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 5
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.