Recognition: no theorem link
WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval
Pith reviewed 2026-05-10 19:28 UTC · model grok-4.3
The pith
Adversarial perturbations applied to model weights opposite gradient descent during fine-tuning reduce overfitting in composed image retrieval.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
During fine-tuning of vision-language pre-trained models for composed image retrieval, generating adversarial perturbations to the model weights in the exact opposite direction of the gradient descent step acts as an effective regularizer. This increases the difficulty of fitting the limited training triplets and thereby narrows the previously overlooked generalization gap between training and test performance.
What carries the argument
Weight-regularized fine-tuning via adversarial perturbations generated opposite the gradient direction, which serves as the sole added regularizer inside the standard fine-tuning loop.
If this is right
- The method narrows the generalization gap across multiple vision-language backbones and multiple CIR datasets.
- It delivers measurable gains in retrieval metrics while using the same limited triplet supervision as prior approaches.
- No additional data collection or model architecture changes are needed beyond the perturbation step.
- The regularization can be inserted into any existing fine-tuning pipeline for CIR.
Where Pith is reading between the lines
- The same opposite-gradient perturbation idea might be tested as a drop-in regularizer for other vision-language tasks that suffer from small labeled sets.
- It is worth checking whether the perturbation size can stay fixed across datasets or must be scaled with model size or learning rate.
- The approach may interact with other common regularizers such as dropout or weight decay in ways the paper does not explore.
Load-bearing premise
Generating perturbations to the weights in the opposite direction of the gradient will reduce overfitting without causing training instability or requiring dataset-specific hyper-parameter changes.
What would settle it
On a new held-out CIR benchmark, compare standard fine-tuning against WRF4CIR and measure whether the train-test recall gap shrinks and whether test recall at rank 10 or 50 rises; if neither improvement appears, the central claim is falsified.
Figures
read the original abstract
Composed Image Retrieval (CIR) task aims to retrieve target images based on reference images and modification texts. Current CIR methods primarily rely on fine-tuning vision-language pre-trained models. However, we find that these approaches commonly suffer from severe overfitting, posing challenges for CIR with limited triplet data. To better understand this issue, we present a systematic study of overfitting in VLP-based CIR, revealing a significant and previously overlooked generalization gap across different models and datasets. Motivated by these findings, we introduce WRF4CIR, a Weight-Regularized Fine-tuning network for CIR. Specifically, during the fine-tuning process, we apply adversarial perturbations to the model weights for regularization, where these perturbations are generated in the opposite direction of gradient descent. Intuitively, WRF4CIR increases the difficulty of fitting the training data, which helps mitigate overfitting in CIR under limited triplet supervision. Extensive experiments on benchmark datasets demonstrate that WRF4CIR significantly narrows the generalization gap and achieves substantial improvements over existing methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies severe overfitting in fine-tuning vision-language pre-trained (VLP) models for Composed Image Retrieval (CIR) under limited triplet supervision, documenting a significant generalization gap across models and datasets via a systematic study. It proposes WRF4CIR, which applies adversarial weight perturbations generated in the opposite direction of gradient descent during fine-tuning to increase fitting difficulty and act as regularization. Extensive experiments on benchmark datasets are claimed to show that WRF4CIR narrows the generalization gap and yields substantial improvements over existing CIR methods.
Significance. If the empirical results are robust, the work contributes a targeted regularization approach for mitigating overfitting in data-scarce VLP fine-tuning for multimodal retrieval tasks. The systematic study of the generalization gap provides useful diagnostic insight into current limitations. The method is defined independently of the evaluation benchmarks, which is a positive aspect for assessing its validity.
major comments (3)
- [§3] §3 (Method description): The adversarial perturbation is specified only at a high level as 'generated in the opposite direction of gradient descent.' No details are provided on the perturbation magnitude (epsilon), its schedule during training, which parameters/layers are perturbed, or safeguards against divergence. This directly impacts the weakest assumption that the technique is stable without dataset-specific tuning.
- [§4] §4 (Experiments): The central claim of 'substantial improvements' and narrowing the generalization gap lacks reported details on baseline implementations, exact metrics (e.g., Recall@K values), number of runs, error bars, or statistical significance tests. Without these, it is difficult to verify whether gains are reliable or sensitive to hyperparameter choices as flagged in the stress-test note.
- [§4.3] Ablation studies (likely in §4.3 or supplementary): No experiments isolate the effect of the anti-gradient direction versus random perturbations, same-direction perturbations, or standard regularizers (e.g., weight decay). This is load-bearing for attributing gains specifically to the proposed mechanism rather than generic regularization.
minor comments (2)
- [Abstract] Abstract: The phrase 'significant and previously overlooked generalization gap' is stated without any quantitative values or references to specific tables/figures from the study section.
- [Throughout] Notation and terminology: Ensure all acronyms (VLP, CIR) are defined at first use and used consistently; clarify the exact loss function and optimizer setup used in the fine-tuning baseline.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which highlight important areas for improving clarity and rigor in our manuscript. We address each major comment point-by-point below. We agree that additional details and experiments are needed to strengthen the presentation and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3] §3 (Method description): The adversarial perturbation is specified only at a high level as 'generated in the opposite direction of gradient descent.' No details are provided on the perturbation magnitude (epsilon), its schedule during training, which parameters/layers are perturbed, or safeguards against divergence. This directly impacts the weakest assumption that the technique is stable without dataset-specific tuning.
Authors: We acknowledge that Section 3 presents the perturbation mechanism at a conceptual level. In the revised manuscript, we will expand the method description to include the specific perturbation magnitude (epsilon value and how it is chosen), the training schedule (e.g., whether it is applied every epoch or with a ramp-up), the parameters or layers affected (e.g., all model weights or selected subsets), and safeguards such as gradient clipping, loss monitoring, or early stopping to prevent divergence. These additions will ensure the technique is fully reproducible and address concerns about stability across datasets. revision: yes
-
Referee: [§4] §4 (Experiments): The central claim of 'substantial improvements' and narrowing the generalization gap lacks reported details on baseline implementations, exact metrics (e.g., Recall@K values), number of runs, error bars, or statistical significance tests. Without these, it is difficult to verify whether gains are reliable or sensitive to hyperparameter choices as flagged in the stress-test note.
Authors: We agree that the experimental section requires more transparency to support the claims of improvements and gap narrowing. In the revision, we will report exact Recall@K (and other metrics) for all methods and datasets, detail how baselines were implemented or sourced (including any re-implementations), present results averaged over multiple independent runs with standard deviations or error bars, and include statistical significance tests (e.g., paired t-tests) where relevant. This will allow readers to assess reliability and sensitivity to hyperparameters. revision: yes
-
Referee: [§4.3] Ablation studies (likely in §4.3 or supplementary): No experiments isolate the effect of the anti-gradient direction versus random perturbations, same-direction perturbations, or standard regularizers (e.g., weight decay). This is load-bearing for attributing gains specifically to the proposed mechanism rather than generic regularization.
Authors: We recognize that isolating the contribution of the anti-gradient direction is essential to validate the proposed mechanism. We will add dedicated ablation studies in the revised Section 4.3 (or supplementary material) that compare WRF4CIR against variants using random perturbations, same-direction perturbations, and standard weight decay regularization, while keeping other factors controlled. These experiments will quantify the specific benefit of the opposite-direction adversarial perturbation over generic regularization approaches. revision: yes
Circularity Check
No significant circularity: method is an independent regularization proposal evaluated empirically
full rationale
The paper identifies overfitting in VLP-based CIR via systematic study, then proposes WRF4CIR as a weight-regularized fine-tuning approach that applies adversarial perturbations to weights in the direction opposite gradient descent. This is presented as a training-time regularization heuristic motivated by the observed generalization gap, not as a mathematical derivation or prediction that reduces to fitted parameters or self-referential definitions. No equations or claims in the provided text equate the method's output to its inputs by construction, nor do they rely on load-bearing self-citations for uniqueness or ansatz smuggling. The central improvements are demonstrated through benchmark experiments rather than tautological renaming or forced statistical outcomes. This is a standard empirical ML contribution with independent content.
Axiom & Free-Parameter Ledger
free parameters (1)
- perturbation magnitude or regularization strength
axioms (1)
- domain assumption Adversarial perturbations opposite to gradient descent increase the difficulty of fitting training data and thereby reduce overfitting in fine-tuning
Reference graph
Works this paper leans on
-
[1]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. 2023. Qwen technical report.arXiv preprint arXiv:2309.16609(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Yang Bai, Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, and Chun-Mei Feng. 2024. Sentence-level prompts benefit composed image retrieval.International Conference on Learning Representations (2024)
2024
-
[3]
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2023. Composed image retrieval using contrastive learning and task-oriented clip- based features.ACM Transactions on Multimedia Computing, Communications and Applications20, 3 (2023), 1–24
2023
-
[4]
Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences116, 32 (2019), 15849–15854
2019
-
[5]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. InProceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162
2018
-
[6]
Yu Cao, Shawn Steffey, Jianbiao He, Degui Xiao, Cui Tao, Ping Chen, and Henning Müller. 2014. Medical image retrieval: a multimodal approach.Cancer informatics 13 (2014), CIN–S14053
2014
-
[7]
Pranit Chawla, Surgan Jandial, Pinkesh Badjatiya, Ayush Chopra, Mausoom Sarkar, and Balaji Krishnamurthy. 2021. Leveraging style and content features for text conditioned image retrieval. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3978–3982
2021
-
[8]
Yanzhe Chen, Zhiwen Yang, Jinglin Xu, and Yuxin Peng. 2025. MAI: A multi-turn aggregation-iteration model for composed image retrieval. InThe Thirteenth International Conference on Learning Representations
2025
- [9]
-
[10]
Yanzhe Chen, Huasong Zhong, Xiangteng He, Yuxin Peng, Jiahuan Zhou, and Lele Cheng. 2024. FashionERN: enhance-and-refine network for composed fashion image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 1228–1236
2024
-
[11]
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie. 2025. Offset: Segmentation-based focus shift revision for composed image retrieval. InProceedings of the 33rd ACM International Conference on Multimedia. 6113–6122
2025
-
[12]
Ginger Delmas, Rafael S Rezende, Gabriela Csurka, and Diane Larlus. 2022. ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. InInternational Conference on Learning Representations
2022
-
[13]
Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout.arXiv preprint arXiv:1708.04552 (2017)
work page internal anchor Pith review arXiv 2017
-
[14]
Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio. 2017. Sharp minima can generalize for deep nets. InInternational Conference on Machine Learning. PMLR, 1019–1028
2017
-
[15]
Zhangchi Feng, Richong Zhang, and Zhijie Nie. 2024. Improving composed image retrieval via contrastive learning with scaling positives and negatives. In Proceedings of the 32nd ACM International Conference on Multimedia. 1632–1641
2024
-
[16]
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2021. Sharpness-aware minimization for efficiently improving generalization.In- ternational Conference on Learning Representations(2021)
2021
-
[17]
Hongfei Ge, Yuanchun Jiang, Jianshan Sun, Kun Yuan, and Yezheng Liu. 2025. Llm-enhanced composed image retrieval: An intent uncertainty-aware linguistic- visual dual channel matching model.ACM Transactions on Information Systems 43, 2 (2025), 1–30
2025
-
[18]
Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma.Neural computation4, 1 (1992), 1–58
1992
-
[19]
Sonam Goenka, Zhaoheng Zheng, Ayush Jaiswal, Rakesh Chada, Yue Wu, Varsha Hedau, and Pradeep Natarajan. 2022. Fashionvlp: Vision language transformer for fashion retrieval with feedback. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14105–14115
2022
-
[20]
Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Roge- rio Feris. 2018. Dialog-based interactive image retrieval.Advances in neural information processing systems31 (2018)
2018
-
[21]
Ehtesham Hassan, Santanu Chaudhury, and Madan Gopal. 2013. Multi-modal information integration for document retrieval. In2013 12th International Con- ference on Document Analysis and Recognition. IEEE, 1200–1204
2013
-
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778
2016
- [23]
-
[24]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780
1997
-
[25]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3
2022
-
[26]
Young Kyun Jang and Donghyun Kim. 2025. Visual delta generation with large multi-modal models enhances composed image retrieval using unlabeled data: YK Jang, D. Kim.Scientific reports15, 1 (2025), 27463
2025
-
[27]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision- language representation learning with noisy text supervision. InInternational conference on machine learning. PMLR, 4904–4916
2021
-
[28]
Xintong Jiang, Yaxiong Wang, Mengjian Li, Yujiao Wu, Bingwen Hu, and Xuem- ing Qian. 2024. CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2177–2187
2024
- [29]
-
[30]
Hoki Kim, Woojin Lee, and Jaewook Lee. 2021. Understanding catastrophic over- fitting in single-step adversarial training. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8119–8127
2021
-
[31]
Jongseok Kim, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. 2021. Dual com- positional learning in interactive image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 1771–1779
2021
-
[32]
Seungmin Lee, Dongwan Kim, and Bohyung Han. 2021. Cosmo: Content-style modulation for image retrieval with text feedback. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 802–812
2021
-
[33]
Matan Levy, Rami Ben-Ari, Nir Darshan, and Dani Lischinski. 2024. Data roaming and quality assessment for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 2991–2999
2024
-
[34]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning. PMLR, 19730–19742
2023
-
[35]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInternational conference on machine learning. PMLR, 12888–12900
2022
-
[36]
Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language repre- sentation learning with momentum distillation.Advances in neural information processing systems34 (2021), 9694–9705
2021
-
[37]
Shuxian Li, Changhao He, Xiting Liu, Joey Tianyi Zhou, Xi Peng, and Peng Hu
-
[38]
InProceedings of the Computer Vision and Pattern Recognition Conference
Learning with Noisy Triplet Correspondence for Composed Image Retrieval. InProceedings of the Computer Vision and Pattern Recognition Conference. 19628– 19637
-
[39]
Zeju Li, Konstantinos Kamnitsas, and Ben Glocker. 2020. Analyzing overfit- ting under class imbalance in neural networks for image segmentation.IEEE transactions on medical imaging40, 3 (2020), 1065–1077
2020
-
[40]
Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, and Stephen Gould
-
[41]
InProceedings of the IEEE/CVF International Conference on Computer Vision
Image retrieval on real-life images with pre-trained vision-and-language models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2125–2134
-
[42]
Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, and Stephen Gould
-
[43]
InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
Bi-directional training for composed image retrieval via text prompt learning. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5753–5762
- [44]
-
[45]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. InInternational Conference on Learning Representations
2019
- [46]
-
[47]
Samuel G Müller and Frank Hutter. 2021. Trivialaugment: Tuning-free yet state-of-the-art data augmentation. InProceedings of the IEEE/CVF international conference on computer vision. 774–782
2021
-
[48]
Bill Psomas, George Retsinas, Nikos Efthymiadis, Panagiotis Filntisis, Yannis Avrithis, Petros Maragos, Ondrej Chum, and Giorgos Tolias. 2025. Instance-level composed image retrieval. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
2025
-
[49]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al
-
[50]
In International conference on machine learning
Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763
-
[51]
Leslie Rice, Eric Wong, and Zico Kolter. 2020. Overfitting in adversarially robust deep learning. InInternational conference on machine learning. PMLR, 8093–8104
2020
-
[52]
Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, and Yanyun Qu. 2025. Multi-Schema Proximity Network for Composed Yizhuo Xu, Chaojian Yu, Yuanjie Shao, Tongliang Liu, Qinmu Peng, Xinge You Image Retrieval. InProceedings of the IEEE/CVF International Conference on Com- puter Vision. 19999–20008
2025
- [53]
-
[54]
Xuemeng Song, Haoqiang Lin, Haokun Wen, Bohan Hou, Mingzhu Xu, and Liqiang Nie. 2025. A comprehensive survey on composed image retrieval.ACM Transactions on Information Systems44, 1 (2025), 1–54
2025
-
[55]
Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, and Yoav Artzi
-
[56]
A Corpus for Reasoning About Natural Language Grounded in Photographs
A corpus for reasoning about natural language grounded in photographs. arXiv preprint arXiv:1811.00491(2018)
work page Pith review arXiv 2018
-
[57]
Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-transfer learning for few-shot learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 403–412
2019
-
[58]
Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, and Siyang Li. 2024. Clip as rnn: Segment countless visual concepts without training endeavor. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13171– 13182
2024
- [59]
-
[60]
Xiaoyu Tao, Xiaopeng Hong, Xinyuan Chang, Songlin Dong, Xing Wei, and Yihong Gong. 2020. Few-shot class-incremental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12183–12192
2020
-
[61]
Likai Tian, Jian Zhao, Zechao Hu, Zhengwei Yang, Hao Li, Lei Jin, Zheng Wang, and Xuelong Li. 2025. CCIN: Compositional Conflict Identification and Neutral- ization for Composed Image Retrieval. InProceedings of the Computer Vision and Pattern Recognition Conference. 3974–3983
2025
-
[62]
Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays
-
[63]
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Composing text and image for image retrieval-an empirical odyssey. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6439–6448
-
[64]
Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial cross-modal retrieval. InProceedings of the 25th ACM international conference on Multimedia. 154–162
2017
-
[65]
Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, et al
-
[66]
InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Image as a foreign language: Beit pretraining for vision and vision-language tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19175–19186
-
[67]
Yifan Wang, Wuliang Huang, Yufan Wen, Shunning Liu, and Chun Yuan. 2025. Towards Robust Uncertainty Calibration for Composed Image Retrieval. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
2025
-
[68]
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, and Yuan Cao
-
[69]
Simvlm: Sim- ple visual language model pretraining with weak supervision
Simvlm: Simple visual language model pretraining with weak supervision. arXiv preprint arXiv:2108.10904(2021)
-
[70]
Haokun Wen, Xuemeng Song, Xiaolin Chen, Yinwei Wei, Liqiang Nie, and Tat- Seng Chua. 2024. Simple but effective raw-data level multimodal fusion for composed image retrieval. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 229–239
2024
-
[71]
Haokun Wen, Xian Zhang, Xuemeng Song, Yinwei Wei, and Liqiang Nie. 2023. Target-guided composed image retrieval. InProceedings of the 31st ACM Interna- tional Conference on Multimedia. 915–923
2023
-
[72]
Dongxian Wu, Shu-Tao Xia, and Yisen Wang. 2020. Adversarial weight pertur- bation helps robust generalization.Advances in neural information processing systems33 (2020), 2958–2969
2020
-
[73]
Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, and Rogerio Feris. 2021. Fashion iq: A new dataset towards retrieving images by natural language feedback. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 11307–11317
2021
-
[74]
Yahui Xu, Jiwei Wei, Yi Bin, Yang Yang, Zeyu Ma, and Heng Tao Shen. 2024. Set of diverse queries with uncertainty regularization for composed image retrieval. IEEE Transactions on Circuits and Systems for Video Technology(2024)
2024
-
[75]
Yuchen Yang, Min Wang, Wengang Zhou, and Houqiang Li. 2021. Cross-modal joint prediction and alignment for composed query image retrieval. InProceedings of the 29th ACM International Conference on Multimedia. 3303–3311
2021
-
[76]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals
-
[77]
Understanding deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530(2016)
work page internal anchor Pith review arXiv 2016
- [78]
-
[79]
Yinan Zhou, Yaxiong Wang, Haokun Lin, Chen Ma, Li Zhu, and Zhedong Zheng
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.