pith. sign in

arxiv: 2605.04501 · v1 · submitted 2026-05-06 · 💻 cs.CV · cs.AI

Example-Based Object Detection

Pith reviewed 2026-05-08 17:57 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords object detectionfalse positive suppressionopen-vocabulary detectionfeature matchingerror examplesSAMDINOv3LightGlue
0
0 comments X p. Extension

The pith

EBOD suppresses repeated false positives and negatives in open-vocabulary object detection by matching prior error examples, without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the practical issue that object detectors like SAM3 still produce persistent false positives and false negatives on the same objects, even though retraining for each new error is expensive in time and resources. It proposes storing examples of those errors and using feature matching to recognize and filter matching instances in future images. The approach combines the prompt-based SAM3 detector with DINOv3 and LightGlue for robust matching of error instances. A reader would care because this offers an incremental way to make deployed detectors more reliable over time in real applications where the same mistakes keep recurring.

Core claim

The EBOD framework integrates a prompt-based detector such as SAM3 with DINOv3 and LightGlue feature matching so that previous false-positive and false-negative examples can be stored and used to suppress identical errors when they reappear in new images, achieving this without any model retraining.

What carries the argument

The EBOD pipeline that matches stored error examples against new-image features via DINOv3 and LightGlue to filter SAM3 detections.

Load-bearing premise

Feature matching between stored error examples and new images can reliably identify and suppress the exact same false positives or negatives.

What would settle it

A test image containing a previously recorded false positive that the system still outputs as a detection after matching the error example.

Figures

Figures reproduced from arXiv: 2605.04501 by ZhiXin Sun.

Figure 1
Figure 1. Figure 1: Overview of the proposed EBOD framework. Step1: Use INSID3 to generate candidate view at source ↗
Figure 2
Figure 2. Figure 2: Images of the same object from two different viewpoints view at source ↗
Figure 3
Figure 3. Figure 3: Given a missed detection case, we visualize the detection results produced by the proposed view at source ↗
read the original abstract

In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets. However, despite these advancements, false positives and false negatives still occur. In practical engineering applications, persistent misdetections or missed detections of the same object are unacceptable. Yet retraining the model every time such errors occur incurs substantial costs in terms of human effort, computational resources, and time. Therefore, how to leverage existing false positive and false negative samples to prevent such errors from recurring remains a highly challenging and urgent problem. To address this issue, we propose EBOD (Example-Based Object Detection), which integrates a prompt-based detector (SAM3) with robust feature matching modules (DINOv3 and LightGlue). The proposed framework effectively suppresses the repeated occurrence of false positives and false negatives by leveraging previous error examples, without requiring additional model retraining. Code is available at https://github.com/sunzx97/examples_based_object_detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes EBOD (Example-Based Object Detection), a framework that integrates the prompt-based open-vocabulary detector SAM3 with robust feature-matching modules DINOv3 and LightGlue. It claims to suppress repeated false positives and false negatives by leveraging prior error examples as references, without any model retraining or fine-tuning.

Significance. If the matching-based suppression mechanism proves reliable, the approach would offer a low-cost, training-free way to improve detection consistency in deployed systems where repeated errors on the same objects are costly. The availability of code is a positive factor for reproducibility.

major comments (2)
  1. [Abstract] Abstract: The central claim that the integration 'effectively suppresses the repeated occurrence of false positives and false negatives' is presented without any supporting experiments, quantitative metrics (e.g., reduction in FP/FN rate, matching precision/recall on error instances), ablation studies, or failure-mode analysis. No validation data or comparison against baselines appears.
  2. [Abstract] Abstract: The effectiveness hinges on the unstated details of how DINOv3+LightGlue matching identifies prior FP/FN instances and applies suppression (negative prompts, mask exclusion, or score adjustment). No similarity thresholds, handling of appearance variation (viewpoint, illumination, occlusion), or bounds on matching reliability are provided, leaving the load-bearing assumption unverified.
minor comments (1)
  1. [Abstract] The GitHub link is provided but no description of the repository contents, example usage, or datasets used for any internal testing is given in the manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the integration 'effectively suppresses the repeated occurrence of false positives and false negatives' is presented without any supporting experiments, quantitative metrics (e.g., reduction in FP/FN rate, matching precision/recall on error instances), ablation studies, or failure-mode analysis. No validation data or comparison against baselines appears.

    Authors: We agree that the abstract, in its current form, presents the central claim at a high level without quantitative support or references to validation. The manuscript body outlines the EBOD framework but does not yet contain the requested experiments, metrics, ablations, or baseline comparisons. We will revise the abstract to remove the unsubstantiated claim of effectiveness and instead describe the intended mechanism, while adding a new experimental section with quantitative results, failure-mode analysis, and comparisons in the revised manuscript. revision: yes

  2. Referee: [Abstract] Abstract: The effectiveness hinges on the unstated details of how DINOv3+LightGlue matching identifies prior FP/FN instances and applies suppression (negative prompts, mask exclusion, or score adjustment). No similarity thresholds, handling of appearance variation (viewpoint, illumination, occlusion), or bounds on matching reliability are provided, leaving the load-bearing assumption unverified.

    Authors: We agree that the abstract omits these implementation details. The current manuscript text does not specify similarity thresholds, robustness to appearance changes, or reliability bounds. In the revision we will expand the abstract with a concise description of the matching pipeline (DINOv3 feature extraction followed by LightGlue matching to prior error examples, followed by score adjustment or mask exclusion) and add the missing parameters and analysis to the method section. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering integration of existing components with no derivations or fitted predictions

full rationale

The paper presents EBOD as a practical framework that combines the off-the-shelf prompt-based detector SAM3 with feature-matching modules DINOv3 and LightGlue. It claims this integration suppresses repeated false positives and negatives by using prior error examples, without retraining. No equations, mathematical derivations, parameter fitting, or self-citations appear in the abstract or described approach. The central claim is an empirical engineering assertion about effectiveness, not a derived result that reduces to its inputs by construction. No load-bearing steps match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested assumption that feature matching will correctly map new images to prior error cases and that suppression logic will then improve detection.

axioms (1)
  • domain assumption DINOv3 and LightGlue feature matching can accurately identify and match previous false-positive and false-negative instances in new images
    Invoked as the mechanism that enables error suppression without retraining.

pith-pipeline@v0.9.0 · 5500 in / 1142 out tokens · 54508 ms · 2026-05-08T17:57:47.922579+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Detect anything via next point prediction

    Qing Jiang, Junan Huo, Xingyu Chen, Yuda Xiong, Zhaoyang Zeng, Yihao Chen, Tianhe Ren, Junzhi Yu, and Lei Zhang. Detect anything via next point prediction, 2025. URL https://arxiv.org/abs/2510.12798

  2. [2]

    T-rex2: Towards generic object detection via text-visual prompt synergy, 2024

    Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, and Lei Zhang. T-rex2: Towards generic object detection via text-visual prompt synergy, 2024

  3. [3]

    SAM 3: Segment Anything with Concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

  4. [4]

    Grounded sam: Assembling open-world models for diverse visual tasks, 2024

    Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, and Lei Zhang. Grounded sam: Assembling open-world models for diverse visual tasks, 2024

  5. [5]

    Few-shot semantic segmentation meets sam3,

    Yi-Jen Tsai, Yen-Yu Lin, and Chien-Yao Wang. Few-shot semantic segmentation meets sam3,

  6. [6]

    URLhttps://arxiv.org/abs/2604.05433

  7. [7]

    INSID3: Training-free in-context segmentation with DINOv3

    Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, and Stefan Roth. INSID3: Training-free in-context segmentation with DINOv3. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

  8. [8]

    arXiv preprint arXiv:2305.13310 (2023)

    Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, and Chunhua Shen. Matcher: Segment anything with one shot using all-purpose feature matching.arXiv preprint arXiv:2305.13310, 2023

  9. [9]

    LightGlue: Local Feature Matching at Light Speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InICCV, 2023

  10. [10]

    Omniglue: Gener- alizable feature matching with foundation model guidance

    Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andre Araujo. Omniglue: Gener- alizable feature matching with foundation model guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  11. [11]

    Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

  12. [12]

    A density-based algorithm for discovering clusters in large spatial databases with noise

    Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InProceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, page 226–231. AAAI Press, 1996

  13. [13]

    SuperPoint: Self-Supervised Interest Point Detection and Description

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description, 2018. URL https://arxiv.org/abs/1712.07629

  14. [14]

    You only look once: Unified, real-time object detection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 5