Recognition: unknown
VFM⁴SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection
Pith reviewed 2026-05-09 21:34 UTC · model grok-4.3
The pith
A frozen vision foundation model supplies stability priors that cut missed detections in single-domain object detectors facing unseen conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Performance degradation under domain shift is driven by rising missed detections that stem from unstable object-background and inter-instance relations in the encoding stage together with harder semantic-spatial alignment of queries in the decoding stage. The authors therefore propose VFM4SDG, a dual-prior learning method that inserts a frozen vision foundation model as a transferable stability source: Cross-domain Stable Relational Prior Distillation strengthens relational modeling in encoding, while Semantic-Contextual Prior-based Query Enhancement injects category-level semantic prototypes and global visual context into queries during decoding.
What carries the argument
Dual-prior learning framework that freezes a vision foundation model and injects its stability into relational distillation in the encoder and semantic-contextual enhancement of queries in the decoder.
If this is right
- Object-background and inter-instance relations remain more stable when the same detector is tested on images from unseen domains.
- Query representations gain improved semantic recognition and spatial localization, lowering the rate of missed detections.
- The same dual-prior additions raise accuracy on two mainstream DETR-based detectors across existing SDGOD evaluation sets.
- No additional source-domain images or target-domain labels are required beyond the single training domain.
Where Pith is reading between the lines
- The same prior-injection pattern could be tested on detection heads other than DETR to check whether the stability benefit is architecture-specific.
- Applying the relational and query priors to video object detection might reduce frame-to-frame missed detections when scene conditions drift over time.
- Measuring whether the frozen VFM priors also improve localization precision on small or occluded objects would reveal additional side benefits.
Load-bearing premise
The stability priors taken from the frozen vision foundation model transfer to the detector without creating new failure modes or domain-specific biases that would increase errors on unseen data.
What would settle it
Running the proposed method on a standard single-domain generalization benchmark and observing that the number of missed detections does not decrease relative to the unmodified baseline detector would falsify the claim that the VFM priors improve cross-domain stability.
Figures
read the original abstract
In real-world scenarios, continual changes in weather, illumination, and imaging conditions cause significant domain shifts, leading detectors trained on a single source domain to degrade severely in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data augmentation or domain-invariant representation learning, but pay limited attention to detector mechanisms, leaving clear limitations under complex domain shifts. Through analytical experiments, we find that performance degradation is dominated by increasing missed detections, which fundamentally arises from reduced cross-domain stability of the detector: object-background and inter-instance relations become less stable in the encoding stage, while semantic-spatial alignment of query representations also becomes harder to maintain in the decoding stage. To this end, we propose VFM$^{4}$SDG, a dual-prior learning framework for SDGOD, which introduces a frozen vision foundation model (VFM) as a transferable cross-domain stability prior into detector representation learning and query modeling. In the encoding stage, we propose Cross-domain Stable Relational Prior Distillation to enhance the robustness of object-background and inter-instance relational modeling. In the decoding stage, we propose Semantic-Contextual Prior-based Query Enhancement, which injects category-level semantic prototypes and global visual context into queries to improve their semantic recognition and spatial localization stability in unseen domains. Extensive experiments show that the proposed method consistently outperforms existing SOTA methods on standard SDGOD benchmarks and two mainstream DETR-based detectors, demonstrating its effectiveness, robustness, and generality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents VFM⁴SDG, a dual-prior learning framework for single-domain generalized object detection (SDGOD). Through analytical experiments, it identifies that performance degradation under domain shifts is dominated by missed detections arising from reduced cross-domain stability in object-background and inter-instance relations (encoding stage) and semantic-spatial query alignment (decoding stage). The method injects transferable stability priors from a frozen vision foundation model (VFM) via two modules: Cross-domain Stable Relational Prior Distillation and Semantic-Contextual Prior-based Query Enhancement. Extensive experiments claim consistent outperformance over existing SOTA methods on standard SDGOD benchmarks and two mainstream DETR-based detectors.
Significance. If the central mechanism holds, the work offers a promising direction for SDGOD by moving beyond data augmentation and domain-invariant representations to leverage frozen VFMs for explicit stability priors. This could improve robustness to real-world shifts (weather, illumination) while remaining computationally efficient. The reported generality across DETR detectors and benchmark gains, if mechanistically validated, would be a substantive contribution to the field.
major comments (2)
- [Analytical Experiments and Results sections] The analytical experiments diagnose missed-detection dominance and reduced relational/query stability as the core failure mode, yet the results provide no direct quantitative measurements (e.g., pre/post stability scores for object-background relations or query alignment) demonstrating that the proposed distillation and enhancement modules measurably restore those specific quantities. Without this link, benchmark gains could stem from generic feature enrichment rather than the claimed stability mechanism.
- [Experiments] The claim that the framework demonstrates 'generality' for arbitrary DETR-based detectors rests on experiments with only two mainstream detectors. The manuscript should either expand the detector testbed or provide a concrete argument (e.g., via ablation on query/relation components) showing why the VFM priors transfer independently of specific DETR architecture choices.
minor comments (2)
- [Method] Clarify the exact formulation of the relational prior distillation loss (e.g., which layers of the VFM are used and how the stability objective is defined) to allow reproducibility.
- [Introduction and Method] Ensure that all stability-related terms (e.g., 'cross-domain relational stability') are formally defined with equations or metrics early in the paper rather than only described qualitatively.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments. We address each major comment below with our responses and indicate where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Analytical Experiments and Results sections] The analytical experiments diagnose missed-detection dominance and reduced relational/query stability as the core failure mode, yet the results provide no direct quantitative measurements (e.g., pre/post stability scores for object-background relations or query alignment) demonstrating that the proposed distillation and enhancement modules measurably restore those specific quantities. Without this link, benchmark gains could stem from generic feature enrichment rather than the claimed stability mechanism.
Authors: We appreciate the referee's emphasis on establishing a direct causal link. Section 4.1 presents quantitative diagnostics of missed-detection increases and stability degradation via relation consistency and query alignment metrics across domains. However, we agree that explicit pre/post measurements tied specifically to the distillation and enhancement modules would more rigorously rule out generic feature enrichment. We will add these direct quantitative comparisons (e.g., stability score deltas before and after each module) in the revised manuscript. revision: yes
-
Referee: [Experiments] The claim that the framework demonstrates 'generality' for arbitrary DETR-based detectors rests on experiments with only two mainstream detectors. The manuscript should either expand the detector testbed or provide a concrete argument (e.g., via ablation on query/relation components) showing why the VFM priors transfer independently of specific DETR architecture choices.
Authors: We acknowledge the experiments use two mainstream DETR-based detectors. The manuscript already includes component-wise ablations isolating the relational prior distillation (encoder) and semantic-contextual query enhancement (decoder). These ablations show consistent gains from targeting core DETR elements—relational modeling and query semantics—that are shared across DETR variants, providing evidence that the VFM priors operate independently of specific architectural choices. This constitutes our concrete argument for generality. We can expand the testbed with a third DETR variant if the referee considers it necessary. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's chain proceeds from analytical experiments diagnosing missed-detection dominance and relational instability, to an independent proposal of two new modules (Cross-domain Stable Relational Prior Distillation in encoding and Semantic-Contextual Prior-based Query Enhancement in decoding) that inject frozen VFM priors. Central performance claims rest on external benchmark comparisons with SOTA methods and two DETR detectors, not on any fitted parameter renamed as prediction, self-definitional loop, or load-bearing self-citation. The derivation introduces new architectural components whose effectiveness is measured separately from the diagnostic observations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Vision foundation models supply transferable cross-domain stability priors for object-background and inter-instance relations.
Reference graph
Works this paper leans on
-
[1]
Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation,
A. Wu and C. Deng, “Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 847–856. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11 GT Daytime ClearDaytime FoggyDusk RainyNight Rainy SA-DETR (DINO) VFM4...
2022
-
[2]
Dg-detr: Toward domain generalized detection transformer,
S. Hwang, D. Han, and M. Jeon, “Dg-detr: Toward domain generalized detection transformer,”arXiv preprint arXiv:2504.19574, 2025
-
[3]
Style-adaptive detection transformer for single-source domain generalized object detection,
J. Han, Y . Wang, and L. Chen, “Style-adaptive detection transformer for single-source domain generalized object detection,”arXiv preprint arXiv:2504.20498, 2025
-
[4]
End-to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213– 229
2020
-
[5]
Learning to learn single domain gen- eralization,
F. Qiao, L. Zhao, and X. Peng, “Learning to learn single domain gen- eralization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12 556–12 565. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12 Source Image Co- VFM4SDG (Co-DETR)DETR DINOv3 Fig. 4. Visualization of Encoder Feature Responses (Layer...
2020
-
[6]
Adversarially adaptive normalization for single domain generalization,
X. Fan, Q. Wang, J. Ke, F. Yang, B. Gong, and M. Zhou, “Adversarially adaptive normalization for single domain generalization,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recogni- tion, 2021, pp. 8208–8217
2021
-
[7]
Progressive domain expansion network for single domain gen- eralization,
L. Li, K. Gao, J. Cao, Z. Huang, Y . Weng, X. Mi, Z. Yu, X. Li, and B. Xia, “Progressive domain expansion network for single domain gen- eralization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 224–233
2021
-
[8]
Learning to diversify for single domain generalization,
Z. Wang, Y . Luo, R. Qiu, Z. Huang, and M. Baktashmotlagh, “Learning to diversify for single domain generalization,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 834– 843
2021
-
[9]
Exact feature distribution matching for arbitrary style transfer and domain generalization,
Y . Zhang, M. Li, R. Li, K. Jia, and L. Zhang, “Exact feature distribution matching for arbitrary style transfer and domain generalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8035–8045
2022
-
[10]
Out-of-domain generalization from a single source: An uncertainty quantification approach,
X. Peng, F. Qiao, and L. Zhao, “Out-of-domain generalization from a single source: An uncertainty quantification approach,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1775–1787, 2022
2022
-
[11]
Meta convolutional neural networks for single domain gen- eralization,
C. Wan, X. Shen, Y . Zhang, Z. Yin, X. Tian, F. Gao, J. Huang, and X.-S. Hua, “Meta convolutional neural networks for single domain gen- eralization,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4682–4691
2022
-
[12]
Attention consistency on visual corruptions for single-source domain generalization,
I. Cugu, M. Mancini, Y . Chen, and Z. Akata, “Attention consistency on visual corruptions for single-source domain generalization,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4165–4174
2022
-
[13]
Adversarial source generation for source-free domain adaptation,
C. Cui, F. Meng, C. Zhang, Z. Liu, L. Zhu, S. Gong, and X. Lin, “Adversarial source generation for source-free domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4887–4898, 2023
2023
-
[14]
Adversarial bayesian augmen- tation for single-source domain generalization,
S. Cheng, T. Gokhale, and Y . Yang, “Adversarial bayesian augmen- tation for single-source domain generalization,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 400–11 410
2023
-
[15]
Meta-causal learning for single domain generalization,
J. Chen, Z. Gao, X. Wu, and J. Luo, “Meta-causal learning for single domain generalization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7683–7692
2023
-
[16]
Center- aware adversarial augmentation for single domain generalization,
T. Chen, M. Baktashmotlagh, Z. Wang, and M. Salzmann, “Center- aware adversarial augmentation for single domain generalization,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4157–4165
2023
-
[17]
Learning class and domain augmentations for single-source open- domain generalization,
P. Bele, V . Bundele, A. Bhattacharya, A. Jha, G. Roig, and B. Banerjee, “Learning class and domain augmentations for single-source open- domain generalization,” inProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2024, pp. 1816–1826
2024
-
[18]
Progressive diversity generation for single domain generalization,
D. Rui, K. Guo, X. Zhu, Z. Wu, and H. Fang, “Progressive diversity generation for single domain generalization,”IEEE Transactions on Multimedia, vol. 26, pp. 10 200–10 210, 2024
2024
-
[19]
Single domain generalization via normalised cross- correlation based convolutions,
W. Chuah, R. Tennakoon, R. Hoseinnezhad, D. Suter, and A. Bab- Hadiashar, “Single domain generalization via normalised cross- correlation based convolutions,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 1752–1761
2024
-
[20]
Wildnet: Learning domain generalized semantic segmentation from the wild,
S. Lee, H. Seong, S. Lee, and E. Kim, “Wildnet: Learning domain generalized semantic segmentation from the wild,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9936–9946
2022
-
[21]
Learning generalized knowledge from a single domain on urban-scene segmentation,
X. Li, M. Li, X. Li, and X. Guo, “Learning generalized knowledge from a single domain on urban-scene segmentation,”IEEE Transactions on Multimedia, vol. 25, pp. 7635–7646, 2022
2022
-
[22]
Style projected clustering for domain generalized semantic segmentation,
W. Huang, C. Chen, Y . Li, J. Li, C. Li, F. Song, Y . Yan, and Z. Xiong, “Style projected clustering for domain generalized semantic segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3061–3071
2023
-
[23]
Adaptive texture filtering for single-domain generalized segmentation,
X. Li, M. Li, Y . Wang, C.-X. Ren, and X. Guo, “Adaptive texture filtering for single-domain generalized segmentation,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 2, 2023, pp. 1442–1450
2023
-
[24]
Clip the gap: A single domain generalization approach for object detection,
V . Vidit, M. Engilberge, and M. Salzmann, “Clip the gap: A single domain generalization approach for object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3219–3229
2023
-
[25]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
2021
-
[26]
Improving single domain-generalized object detection: A focus on diversification and alignment,
M. S. Danish, M. H. Khan, M. A. Munir, M. S. Sarfraz, and M. Ali, “Improving single domain-generalized object detection: A focus on diversification and alignment,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 17 732– 17 742
2024
-
[27]
Towards robust object detection invariant to real-world domain shifts,
Q. Fan, M. Segu, Y .-W. Tai, F. Yu, C.-K. Tang, B. Schiele, and D. Dai, “Towards robust object detection invariant to real-world domain shifts,” inThe Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview, 2023
2023
-
[28]
Srcd: Se- mantic reasoning with compound domains for single-domain generalized object detection,
Z. Rao, J. Guo, L. Tang, Y . Huang, X. Ding, and S. Guo, “Srcd: Se- mantic reasoning with compound domains for single-domain generalized object detection,”IEEE Transactions on Neural Networks and Learning Systems, 2024
2024
-
[29]
G-nas: Generalizable neural architecture search for single domain generalization object detection,
F. Wu, J. Gao, L. Hong, X. Wang, C. Zhou, and N. Ye, “G-nas: Generalizable neural architecture search for single domain generalization object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5958–5966
2024
-
[30]
Unbiased faster r-cnn for single-source domain generalized object detection,
Y . Liu, S. Zhou, X. Liu, C. Hao, B. Fan, and J. Tian, “Unbiased faster r-cnn for single-source domain generalized object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 28 838–28 847
2024
-
[31]
Deit iii: Revenge of the vit,
H. Touvron, M. Cord, and H. J ´egou, “Deit iii: Revenge of the vit,” in European conference on computer vision. Springer, 2022, pp. 516–533
2022
-
[32]
Grounded language-image pre-training,
L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y . Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwanget al., “Grounded language-image pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 965–10 975
2022
-
[33]
Segment anything,
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14
2023
-
[34]
SAM 2: Segment Anything in Images and Videos
N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024
work page internal anchor Pith review arXiv 2024
-
[35]
SAM 3: Segment Anything with Concepts
N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huanget al., “Sam 3: Segment anything with concepts,”arXiv preprint arXiv:2511.16719, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
An empirical study of training self- supervised vision transformers,
X. Chen, S. Xie, and K. He, “An empirical study of training self- supervised vision transformers,” inProceedings of the IEEE/CVF in- ternational conference on computer vision, 2021, pp. 9640–9649
2021
-
[37]
Emerging properties in self-supervised vision transformers,
M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660
2021
-
[38]
Is imagenet worth 1 video? learning strong image encoders from 1 long unlabelled video,
S. Venkataramanan, M. N. Rizve, J. Carreira, Y . M. Asano, and Y . Avrithis, “Is imagenet worth 1 video? learning strong image encoders from 1 long unlabelled video,”arXiv preprint arXiv:2310.08584, 2023
-
[39]
Self-supervised cross- stage regional contrastive learning for object detection,
J. Yan, L. Yang, Y . Gao, and W.-S. Zheng, “Self-supervised cross- stage regional contrastive learning for object detection,” in2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2023, pp. 1044–1049
2023
-
[40]
Masked au- toencoders are scalable vision learners,
K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009
2022
-
[41]
BEiT: BERT Pre-Training of Image Transformers
H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: Bert pre-training of image transformers,”arXiv preprint arXiv:2106.08254, 2021
work page internal anchor Pith review arXiv 2021
-
[42]
Image as a foreign language: Beit pretraining for vision and vision-language tasks,
W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Somet al., “Image as a foreign language: Beit pretraining for vision and vision-language tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 175–19 186
2023
-
[43]
Deconstructing denoising diffusion models for self-supervised learning
X. Chen, Z. Liu, S. Xie, and K. He, “Deconstructing denoising diffusion models for self-supervised learning,”arXiv preprint arXiv:2401.14404, 2024
-
[44]
iBOT: Image BERT Pre-Training with Online Tokenizer
J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong, “ibot: Image bert pre-training with online tokenizer,”arXiv preprint arXiv:2111.07832, 2021
work page internal anchor Pith review arXiv 2021
-
[45]
DINOv2: Learning Robust Visual Features without Supervision
M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[46]
O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Vision transformer adapter for dense predictions,
Z. Chen, Y . Duan, W. Wang, J. He, T. Lu, J. Dai, and Y . Qiao, “Vision transformer adapter for dense predictions,”arXiv preprint arXiv:2205.08534, 2022
-
[48]
Frozen- detr: Enhancing detr with image understanding from frozen foundation models,
S. Fu, J. Yan, Q. Yang, X. Wei, X. Xie, and W.-S. Zheng, “Frozen- detr: Enhancing detr with image understanding from frozen foundation models,”Advances in Neural Information Processing Systems, vol. 37, pp. 105 949–105 971, 2024
2024
-
[49]
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,”arXiv preprint arXiv:2203.03605, 2022
work page internal anchor Pith review arXiv 2022
-
[50]
Detrs with collaborative hybrid assign- ments training. arxiv 2022,
Z. Zong, G. Song, and Y . Liu, “Detrs with collaborative hybrid assign- ments training. arxiv 2022,”arXiv preprint arXiv:2211.12860, 2022
-
[51]
arXiv preprint arXiv:2510.25257
Z. Liao, Y . Zhao, X. Shan, Y . Yan, C. Liu, L. Lu, X. Ji, and J. Chen, “Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models,”arXiv preprint arXiv:2510.25257, 2025
-
[52]
Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains,
M. Xu, L. Qin, W. Chen, S. Pu, and L. Zhang, “Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 8103–8112
2023
-
[53]
Object-aware domain gen- eralization for object detection,
W. Lee, D. Hong, H. Lim, and H. Myung, “Object-aware domain gen- eralization for object detection,” inproceedings of the AAAI conference on artificial intelligence, vol. 38, no. 4, 2024, pp. 2947–2955
2024
-
[54]
Physaug: A physical-guided and frequency-based data augmentation for single- domain generalized object detection,
X. Xu, J. Yang, W. Shi, S. Ding, L. Luo, and J. Liu, “Physaug: A physical-guided and frequency-based data augmentation for single- domain generalized object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 20, 2025, pp. 21 815– 21 823
2025
-
[55]
Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al
A. Xiao, W. Yu, and H. Yu, “Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al.”International Journal of Computer Vision, vol. 133, no. 11, pp. 7710–7725, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.