arxiv: 2604.23718 · v1 · submitted 2026-04-26 · 💻 cs.CV

Recognition: unknown

Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection

Xuefen Liu , Xinquan Yang , Mianjie Zheng , Kun Tang , Xuguang Li , Xiaoqi Guo , Linlin Shen , He Meng

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:40 UTC · model grok-4.3

classification 💻 cs.CV

keywords caries detectionDETRintraoral imagesdental imagingtransformer-based detectionlesion-aware lossmedical image analysis

0 comments

The pith

Caries-DETR improves detection of subtle low-contrast dental lesions by injecting tooth structure priors into DETR queries and dynamically reweighting losses for hard examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Caries-DETR, a Transformer detector specialized for caries in intraoral photographs. It introduces a Tooth Structure-aware Query Initialization step that draws on large-scale pre-training and a structure perception branch to steer attention toward anatomically plausible lesion sites. It also adds a Lesion-aware Dynamic Loss Refinement step that scales the loss according to lesion size, anatomical context, and prediction quality. These changes target the domain-specific difficulty that standard detectors overlook small, faint caries. If the approach holds, routine dental images could support earlier automated screening without requiring higher-resolution hardware.

Core claim

Caries-DETR reaches state-of-the-art results on the AlphaDent and DentalAI datasets by guiding query initialization with high-frequency tooth structure priors and by adaptively reweighting the loss for subtle lesions according to their size, anatomical relevance, and current prediction quality.

What carries the argument

Tooth Structure-aware Query Initialization (TSQI), which combines pre-trained intraoral features with a structure perception branch to embed anatomical priors into the DETR queries, together with Lesion-aware Dynamic Loss Refinement (LDLR), which performs quality-driven hard mining by reweighting losses on the fly.

If this is right

Standard DETR-style detectors can be specialized for low-contrast medical lesions without redesigning the entire backbone.
Pre-training on large intraoral photo collections supplies reusable structural priors that improve localization of early-stage pathology.
Adaptive per-instance loss reweighting based on lesion size and anatomical score offers a general way to handle class imbalance and difficulty variation in detection tasks.
The resulting model shows improved generalization across two independent public dental datasets, suggesting robustness to variations in imaging conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar query-initialization priors could be derived for other anatomical sites where lesions are subtle and location-specific, such as early retinal or skin lesions.
The dynamic loss scheme might transfer to non-medical domains that also feature small, low-contrast targets, such as defect detection in manufacturing imagery.
If the pre-training corpus size proves critical, smaller medical domains could benefit from synthetic structure priors generated from anatomical models rather than real photos.

Load-bearing premise

The reported gains on the two public datasets arise from the TSQI and LDLR modules rather than from unstated differences in training length, hyperparameters, or dataset-specific tuning.

What would settle it

An ablation experiment that removes TSQI and LDLR while keeping all other training details fixed and still matches or exceeds the full model's average precision on both test sets.

Figures

Figures reproduced from arXiv: 2604.23718 by He Meng, Kun Tang, Linlin Shen, Mianjie Zheng, Xiaoqi Guo, Xinquan Yang, Xuefen Liu, Xuguang Li.

**Figure 1.** Figure 1: Overall framework of the proposed Caries-DETR. view at source ↗

**Figure 2.** Figure 2: Visual comparison of different methods on the AlphaDent dataset. (a) raw intraoral image (b) Ground truth (c) YOLOv12 (d) view at source ↗

read the original abstract

As dental caries appear as subtle, low-contrast lesions in intraoral imaging, existing deep learning models face significant challenges in the early detection of caries. While recent Transformer-based detectors have shown promising results in natural images, they often fail to capture the domain-specific anatomical priors crucial for dental caries detection. In this paper, we propose Caries-DETR, a specialized Transformer framework for caries detection in intraoral images. A Tooth Structure-aware Query Initialization (TSQI) is designed, leveraging large-scale intraoral photograph pre-training and a structure perception branch (SPB) to integrate high-frequency structural priors, guiding the model to focus on anatomically significant lesion areas. Furthermore, we design a Lesion-aware Dynamic Loss Refinement (LDLR) to implement quality-driven hard mining through adaptive loss reweighting based on lesion size, anatomical relevance, and prediction quality, optimizing detection for subtle lesions. Extensive experiments on two public datasets (i.e., AlphaDent and DentalAI) demonstrate that Caries-DETR achieves a state-of-the-art performance compared to existing methods and exhibits good generalization and robustness. Code and data at https://github.com/XuefenLiu-SZU/Caries-DETR}{https://github.com/XuefenLiu-SZU/Caries-DETR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Caries-DETR adds dental priors to DETR queries and loss but the SOTA numbers rest on comparisons that lack isolating ablations.

read the letter

The paper's core move is to adapt DETR for intraoral caries by initializing queries with tooth structure knowledge from large-scale dental pre-training plus a structure perception branch, then using a lesion-aware dynamic loss that reweights examples by size, anatomical relevance, and prediction quality. That exact pairing has not appeared in the prior DETR or dental detection work cited in the abstract. It targets a real pain point: standard detectors miss subtle, low-contrast lesions, and the pre-training step plus adaptive reweighting are reasonable ways to inject domain knowledge without redesigning the whole architecture. The experiments run on two public datasets, AlphaDent and DentalAI, report better numbers than existing methods, and the authors release code, which helps anyone who wants to test or extend it. Those are the concrete positives. The main soft spot is the missing controls. The abstract claims state-of-the-art performance but does not show ablations that keep training schedule, optimizer, augmentation, and pre-training data fixed while toggling only the TSQI and LDLR modules. The stress-test note flags exactly this risk, and nothing in the provided description contradicts it. Without those numbers it remains possible that the gains trace to longer training or dataset-specific tuning rather than the proposed pieces. The claim is not circular by construction since they use held-out test sets, but the attribution stays uncertain until the full experimental section is checked. This is practical work aimed at dental imaging groups and medical detection researchers who need a starting point for caries tasks. A reader building or evaluating detectors in this narrow domain would get usable ideas from the pre-training strategy and the loss design. I would bring it to a reading group to walk through the dynamic loss implementation and any ablation tables that exist. It deserves peer review because the task is concrete, the approach is grounded in existing techniques, and the code release lowers the barrier to verification, even though referees will almost certainly request the missing controls.

Referee Report

2 major / 2 minor

Summary. The paper introduces Caries-DETR, a DETR variant for caries detection in intraoral images. It proposes Tooth Structure-aware Query Initialization (TSQI) that leverages large-scale intraoral pre-training and a structure perception branch to inject anatomical priors, plus Lesion-aware Dynamic Loss Refinement (LDLR) that performs adaptive loss reweighting based on lesion size, anatomical relevance, and prediction quality. Experiments on AlphaDent and DentalAI are reported to achieve state-of-the-art detection performance with improved generalization and robustness.

Significance. If the reported gains can be isolated to TSQI and LDLR, the work would offer a concrete example of embedding domain-specific anatomical priors and quality-aware loss modulation into Transformer detectors for low-contrast medical lesions. This could be useful for other subtle-lesion tasks where standard DETR variants underperform due to lack of structural guidance.

major comments (2)

[§4 (Experiments)] §4 (Experiments): The manuscript reports SOTA results on AlphaDent and DentalAI but supplies no ablation tables that hold training schedule, optimizer, data augmentation, pre-training data, and backbone fixed while toggling only TSQI and LDLR. Without these controls, the headline improvements cannot be attributed to the proposed modules rather than confounding factors such as longer training or dataset-specific hyperparameter search.
[§3.2 (LDLR)] §3.2 (LDLR): The dynamic loss formulation is described at a high level but lacks the precise mathematical definition of the reweighting function (e.g., how lesion size, anatomical relevance, and quality scores are normalized and combined into per-sample weights). This prevents verification that LDLR is more than a re-packaging of standard hard-mining or focal-loss variants.

minor comments (2)

[Abstract] The abstract claims 'extensive experiments' yet contains no numerical results, confidence intervals, or even a single mAP value; readers must reach the experimental section to evaluate the strength of the SOTA claim.
[Figures 2-3] Figure captions and the structure-perception-branch diagram would benefit from explicit annotation of which feature maps are passed to the query initialization versus the detection head.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's potential. We address the two major comments below and will revise the manuscript to strengthen the experimental controls and mathematical clarity.

read point-by-point responses

Referee: [§4 (Experiments)] §4 (Experiments): The manuscript reports SOTA results on AlphaDent and DentalAI but supplies no ablation tables that hold training schedule, optimizer, data augmentation, pre-training data, and backbone fixed while toggling only TSQI and LDLR. Without these controls, the headline improvements cannot be attributed to the proposed modules rather than confounding factors such as longer training or dataset-specific hyperparameter search.

Authors: We agree that isolating the contributions of TSQI and LDLR requires strictly controlled ablations. The current experiments include component-wise ablations, but they do not hold every hyperparameter and training detail fixed across all variants. In the revised manuscript we will add a dedicated ablation table in Section 4 that fixes training schedule, optimizer, data augmentation, pre-training corpus, and backbone, toggling only the presence of TSQI and LDLR. This will allow direct attribution of gains to the proposed modules. revision: yes
Referee: [§3.2 (LDLR)] §3.2 (LDLR): The dynamic loss formulation is described at a high level but lacks the precise mathematical definition of the reweighting function (e.g., how lesion size, anatomical relevance, and quality scores are normalized and combined into per-sample weights). This prevents verification that LDLR is more than a re-packaging of standard hard-mining or focal-loss variants.

Authors: We acknowledge that the current description of LDLR is at a conceptual level. In the revision we will insert the exact mathematical formulation of the reweighting function in Section 3.2, including the normalization steps for lesion size, anatomical relevance score, and prediction quality, together with the formula that combines them into the per-sample loss weight. The added equations will make explicit how domain-specific dental priors are incorporated, distinguishing LDLR from generic hard-mining or focal-loss approaches. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural modules and empirical results are independent of inputs

full rationale

The paper introduces TSQI (using pre-training and SPB for structural priors) and LDLR (adaptive loss reweighting) as design choices motivated by domain characteristics of dental images. These are not derived from equations that reduce to fitted parameters or self-citations. Performance claims are measured on held-out public datasets (AlphaDent, DentalAI) against baselines; no self-definitional loops, no 'prediction' that is the fit itself, and no load-bearing uniqueness theorems from the same authors. The derivation chain consists of standard DETR extensions plus empirical validation, which remains falsifiable and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard supervised object-detection assumptions plus two new architectural components whose contribution is asserted but not isolated in the provided abstract.

axioms (2)

domain assumption Intraoral images contain detectable high-frequency structural priors that can be transferred from large-scale pre-training to guide lesion localization.
Invoked in the description of TSQI and SPB.
domain assumption Lesion size, anatomical relevance, and prediction quality are reliable signals for adaptive loss reweighting that improves detection of subtle caries.
Invoked in the description of LDLR.

invented entities (2)

Tooth Structure-aware Query Initialization (TSQI) no independent evidence
purpose: To inject anatomical priors into DETR queries using pre-trained structure perception.
New module introduced to address domain-specific challenges.
Lesion-aware Dynamic Loss Refinement (LDLR) no independent evidence
purpose: To perform quality-driven hard mining via adaptive reweighting.
New loss component introduced for subtle lesions.

pith-pipeline@v0.9.0 · 5554 in / 1407 out tokens · 60112 ms · 2026-05-08T06:40:12.400387+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Cariesx- plainer: Enhancing dental caries detection using gradient- weighted class activation mapping and transfer learning

Saira Asghar, Junaid Rashid, and Anum Masood. Cariesx- plainer: Enhancing dental caries detection using gradient- weighted class activation mapping and transfer learning. Multimedia Tools and Applications, pages 1–26, 2025. 3, 8

2025
[2]

Sohaib Asif, Vicky Yang Wang, and Dong Xu. Oraltransnet: A novel hybrid model integrating transformer attention and cnn features for accurate diagnosis of mouth and oral dis- eases.Engineering Applications of Artificial Intelligence, 159:111609, 2025. 3, 8

2025
[3]

Cascade r-cnn: High quality object detection and instance segmentation.IEEE transactions on pattern analysis and machine intelligence, 43(5):1483–1498, 2019

Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: High quality object detection and instance segmentation.IEEE transactions on pattern analysis and machine intelligence, 43(5):1483–1498, 2019. 7

2019
[4]

End-to- end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 1, 2, 7

2020
[5]

You only look one-level feature

Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, and Jian Sun. You only look one-level feature. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 13039–13048, 2021. 7

2021
[6]

Disentangle your dense ob- ject detector

Zehui Chen, Chenhongyi Yang, Qiaofei Li, Feng Zhao, Zheng-Jun Zha, and Feng Wu. Disentangle your dense ob- ject detector. InProceedings of the 29th ACM international conference on multimedia, pages 4939–4948, 2021. 7

2021
[7]

Digital dental photography: a contemporary revolution.International journal of clinical pediatric dentistry, 6(3):193, 2013

Vela Desai and Dipika Bumb. Digital dental photography: a contemporary revolution.International journal of clinical pediatric dentistry, 6(3):193, 2013. 1

2013
[8]

Detection of dental caries in oral photographs taken by mobile phones based on the yolov3 algorithm.An- nals of Translational Medicine, 9(21):1622, 2021

Baichen Ding, Zhuo Zhang, Yiran Liang, Weiwei Wang, Si- wei Hao, Ze Meng, Lian Guan, Ying Hu, Bin Guo, Runlian Zhao, et al. Detection of dental caries in oral photographs taken by mobile phones based on the yolov3 algorithm.An- nals of Translational Medicine, 9(21):1622, 2021. 2

2021
[9]

Yolov3: An incremental improvement

Ali Farhadi, Joseph Redmon, et al. Yolov3: An incremental improvement. InComputer vision and pattern recognition, volume 1804, pages 1–6. Springer Berlin/Heidelberg, Ger- many, 2018. 2

2018
[10]

John Wiley & Sons, 2015

Ole Fejerskov, Bente Nyvad, and Edwina Kidd.Dental caries: the disease and its clinical management. John Wiley & Sons, 2015. 1

2015
[11]

Tood: Task-aligned one-stage object detec- tion

Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R Scott, and Weilin Huang. Tood: Task-aligned one-stage object detec- tion. In2021 IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 3490–3499. IEEE Computer So- ciety, 2021. 7

2021
[12]

Detection and diagnosis of the early caries lesion

J Gomez. Detection and diagnosis of the early caries lesion. BMC oral health, 15(Suppl 1):S3, 2015. 1

2015
[13]

Salience detr: Enhancing detection trans- former with hierarchical salience filtering refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, and Badong Chen. Salience detr: Enhancing detection trans- former with hierarchical salience filtering refinement. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 17574–17583, 2024. 7, 8

2024
[14]

Real-time object detection meets dinov3.arXiv,

Shihua Huang, Yongjie Hou, Longfei Liu, Xuanlong Yu, and Xi Shen. Real-time object detection meets dinov3.arXiv,
[15]

Dq-detr: Detr with dynamic query for tiny object de- tection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, and Wen-Huang Cheng. Dq-detr: Detr with dynamic query for tiny object de- tection. InEuropean Conference on Computer Vision, pages 290–305. Springer, 2024. 2

2024
[16]

Cariesfg: A fine-grained rgb image classification framework with attention mechanism for dental caries.Engi- neering Applications of Artificial Intelligence, 123:106306,

Hao Jiang, Peiliang Zhang, Chao Che, Bo Jin, and Yongjun Zhu. Cariesfg: A fine-grained rgb image classification framework with attention mechanism for dental caries.Engi- neering Applications of Artificial Intelligence, 123:106306,
[17]

Digital image analysis and visualization of early caries changes in human teeth.Mate- rials Science-Poland, 23(2), 2005

El ˙zbieta Kaczmarek, Anna Surdacka, Teresa Matthews- Brzozowska, and B Miskowiak. Digital image analysis and visualization of early caries changes in human teeth.Mate- rials Science-Poland, 23(2), 2005. 1

2005
[18]

Global burden of untreated caries: a systematic review and metaregression.Journal of dental research, 94(5):650–658, 2015

Nicholas J Kassebaum, E Bernab ´e, M Dahiya, B Bhandari, CJL Murray, and W Marcenes. Global burden of untreated caries: a systematic review and metaregression.Journal of dental research, 94(5):650–658, 2015. 1

2015
[19]

Cnn-based remote dental diagnosis model for caries detec- tion with grad-cam.Scientific Reports, 15(1):26555, 2025

Donghyeok Kim, Jangkyum Kim, and Seong Gon Choi. Cnn-based remote dental diagnosis model for caries detec- tion with grad-cam.Scientific Reports, 15(1):26555, 2025. 3, 8

2025
[20]

Accuracy of clinical photography for the detection of dental caries: A systematic review and meta-analysis.Jour- nal of Dentistry, 2025

Jason Chi Kit Ku, Kaijing Mao, Feifei Wang, Adriana da Fonte Porto Carreiro, Walter Yu Hang Lam, and Ollie Yiru Yu. Accuracy of clinical photography for the detection of dental caries: A systematic review and meta-analysis.Jour- nal of Dentistry, 2025. 1

2025
[21]

Detection and diagnosis of dental caries using a deep learning-based convolutional neural network al- gorithm.Journal of dentistry, 77:106–111, 2018

Jae-Hong Lee, Do-Hyung Kim, Seong-Nyum Jeong, and Seong-Ho Choi. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network al- gorithm.Journal of dentistry, 77:106–111, 2018. 1

2018
[22]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 1, 2, 7

2017
[23]

Dab-detr: Dynamic anchor boxes are better queries for detr,

Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. Dab-detr: Dynamic anchor boxes are better queries for detr.arXiv preprint arXiv:2201.12329, 2022. 2, 7

work page arXiv 2022
[24]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEuro- pean conference on computer vision, pages 38–55. Springer,
[25]

Enhancing den- tal disease classification with agent attention infused vision transformer in conformer architecture.Biomedical Signal Processing and Control, 112:108373, 2026

Wanxin Liu, Xuxia Wang, and Jun Zhang. Enhancing den- tal disease classification with agent attention infused vision transformer in conformer architecture.Biomedical Signal Processing and Control, 112:108373, 2026. 3, 8

2026
[26]

Grid r-cnn

Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan. Grid r-cnn. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7363–7372,
[27]

Conditional detr for fast training convergence

Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. Conditional detr for fast training convergence. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 3651–3660, 2021. 2, 7

2021
[28]

Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015. 1, 2, 7

2015
[29]

Rf-detr: Neural architecture search for real-time detection transformers, 2025

Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ra- manan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025. 2, 7, 8

2025
[30]

Dental caries early detection using convolutional neural network for tele dentistry

Devesh Saini, Richa Jain, and Anita Thakur. Dental caries early detection using convolutional neural network for tele dentistry. In2021 7th International Conference on Ad- vanced Computing and Communication Systems (ICACCS), volume 1, pages 958–963. IEEE, 2021. 2

2021
[31]

Dental radiology: a convolutional neural network-based approach to detect dental disorders from den- tal images in a real-time environment.Multimedia Systems, 29(6):3179–3191, 2023

Humaira Shafiq, Ghulam Gilanie, Muhammad Sajid, and Muhammad Ahsan. Dental radiology: a convolutional neural network-based approach to detect dental disorders from den- tal images in a real-time environment.Multimedia Systems, 29(6):3179–3191, 2023. 8

2023
[32]

Sosnin, Yuriy L

Evgeniy I. Sosnin, Yuriy L. Vasilev, Roman A. Solovyev, Aleksandr L. Stempkovskiy, Dmitry V . Telpukhov, Artem A. Vasilev, Aleksandr A. Amerikanov, and Aleksandr Y . Ro- manov. Alphadent: A dataset for automated tooth pathology detection, 2025. 5

2025
[33]

Sparse r-cnn: End-to-end ob- ject detection with learnable proposals

Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chen- feng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, et al. Sparse r-cnn: End-to-end ob- ject detection with learnable proposals. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14454–14463, 2021. 7

2021
[34]

Dental pho- tography using digital single-lens reflex cameras vs smart- phones.AJO-DO Clinical Companion, 5(1):26–34, 2025

Kathryn Teruya, Jae Hyun Park, and Curt Bay. Dental pho- tography using digital single-lens reflex cameras vs smart- phones.AJO-DO Clinical Companion, 5(1):26–34, 2025. 1

2025
[35]

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real-time object detectors.arXiv preprint arXiv:2502.12524, 2025. 2, 7, 8

work page internal anchor Pith review arXiv 2025
[36]

Dentalai computer vision project, 2023

Pawan Valluri. Dentalai computer vision project, 2023. 5

2023
[37]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2

2017
[38]

Chang-Bin Zhang, Yujie Zhong, and Kai Han. Mr. detr: In- structive multi-route training for detection transformers. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2025. 2, 7, 8

2025
[39]

Dynamic r-cnn: Towards high quality object detection via dynamic training

Hongkai Zhang, Hong Chang, Bingpeng Ma, Naiyan Wang, and Xilin Chen. Dynamic r-cnn: Towards high quality object detection via dynamic training. InEuropean conference on computer vision, pages 260–275. Springer, 2020. 7

2020
[40]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection.arXiv preprint arXiv:2203.03605, 2022. 1, 2, 7, 8

work page internal anchor Pith review arXiv 2022
[41]

Multi-category fusion contrastive learning with core data selection for robust rgb image-based den- tal caries classification.Information Fusion, page 103390,

Peiliang Zhang, Yaru Chen, Yunjiong Liu, Chao Che, and Yongjun Zhu. Multi-category fusion contrastive learning with core data selection for robust rgb image-based den- tal caries classification.Information Fusion, page 103390,
[42]

Dense distinct query for end-to-end object detection

Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, and Kai Chen. Dense distinct query for end-to-end object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 7329–7338, 2023. 7

2023
[43]

Objects as points,

Xingyi Zhou, Dequan Wang, and Philipp Kr ¨ahenb¨uhl. Ob- jects as points.arXiv preprint arXiv:1904.07850, 2019. 2, 7

work page arXiv 1904
[44]

Feature se- lective anchor-free module for single-shot object detection

Chenchen Zhu, Yihui He, and Marios Savvides. Feature se- lective anchor-free module for single-shot object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 840–849, 2019. 7

2019
[45]

Deformable detr: Deformable transformers for end-to-end object detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. InInternational Conference on Learning Representations, 2021. 1, 2, 7

2021
[46]

Detrs with col- laborative hybrid assignments training

Zhuofan Zong, Guanglu Song, and Yu Liu. Detrs with col- laborative hybrid assignments training. InProceedings of the IEEE/CVF international conference on computer vision, pages 6748–6758, 2023. 2, 7, 8 10

2023