Recognition: unknown
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
Pith reviewed 2026-05-10 17:16 UTC · model grok-4.3
The pith
Quantization savings let interactive side experts scale up for higher accuracy in memory-efficient transfer learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By pairing lower-bit weight quantization with an interactive mixture-of-experts side network that routes experts using frozen-backbone features, the method increases side-network capacity without increasing memory cost during fine-tuning.
What carries the argument
The Interactive Side Mixture-of-Experts (ISMoE) that selects experts by direct interaction with salient features extracted from the frozen backbone, made feasible by memory recovered through Gaussian Noise Perturbed Iterative Quantization (GNP-IQ).
If this is right
- Accuracy rises on vision-language and language-only downstream tasks relative to existing memory-efficient transfer learning methods.
- Trainable parameter count and peak memory usage stay comparable to prior METL approaches.
- Interactive expert selection reduces knowledge forgetting by conditioning routing on backbone features.
Where Pith is reading between the lines
- The same quantization-plus-interactive-routing pattern could be tested on other frozen foundation models where feature extraction remains stable under low-bit weights.
- If quantization error can be driven still lower, the side network could be enlarged further while remaining within the same memory budget.
- The method suggests a route to larger adapters in settings where full back-propagation through the backbone is infeasible.
Load-bearing premise
The noise-perturbed iterative quantization lowers errors enough that the enlarged interactive expert network can still extract useful signals from the frozen backbone without instability or capacity loss.
What would settle it
An experiment in which increasing side-network size under MP-ISMoE produces no accuracy gain over standard METL baselines, or in which expert routing becomes unstable when backbone features are passed through the quantized side network.
Figures
read the original abstract
Parameter-efficient transfer learning (PETL) has emerged as a pivotal paradigm for adapting pre-trained foundation models to downstream tasks, significantly reducing trainable parameters yet suffering from substantial memory overhead caused by gradient backpropagation during fine-tuning. While memory-efficient transfer learning (METL) circumvents this challenge by bypassing backbone gradient computation via lightweight small side networks, its stringent memory constraint severely limits learning capacity of side networks, thereby significantly compromising performance. To address these limitations, we propose a novel Mixed-Precision Interactive Side Mixture-of-Experts framework (MP-ISMoE). Specifically, we first propose a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme to quantize weights into lower-bits while effectively decreasing quantization errors. By leveraging memory conserved from GNP-IQ, we subsequently employ Interactive Side Mixture-of-Experts (ISMoE) to scaling up side networks without sacrificing overall memory efficiency. Different from conventional mixture-of-experts, ISMoE learns to select optimal experts by interacting with salient features from frozen backbones, thus suppressing knowledge forgetting and boosting performance. Extensive experiments across diverse vision-language and language-only tasks demonstrate that MP-ISMoE remarkably promotes accuracy compared to state-of-the-art METL approaches, while maintaining comparable parameter and memory efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MP-ISMoE, a mixed-precision interactive side mixture-of-experts framework for parameter- and memory-efficient transfer learning. It introduces Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) to quantize weights to lower bits while reducing quantization errors, using the saved memory to scale up side networks via ISMoE, where expert selection interacts with salient features from the frozen backbone to reduce forgetting and improve performance. Experiments on vision-language and language tasks claim superior accuracy over SOTA METL methods with comparable efficiency.
Significance. If the claims hold, this work could advance METL by enabling larger, more expressive side networks without memory overhead, addressing the capacity limitations of current side-network approaches in transfer learning. The interactive MoE mechanism and quantization scheme represent potential innovations in efficient adaptation of foundation models.
major comments (2)
- [Method (GNP-IQ and ISMoE subsections)] § on GNP-IQ (likely §3.1 or equivalent): The central claim that GNP-IQ frees sufficient memory for ISMoE scaling without capacity loss or routing instability requires explicit evidence. The manuscript must quantify the error reduction (e.g., via per-layer MSE or gradient norm comparisons) of the Gaussian perturbation step versus plain iterative quantization, and include an ablation isolating whether gains derive from the interaction mechanism rather than simply more quantized parameters.
- [Experiments] Experiments section (likely §4): The abstract asserts 'remarkably promotes accuracy' across diverse tasks, but the provided summary supplies no numerical results, baseline tables, or statistical significance tests. Full results must report exact accuracy deltas, parameter/memory footprints, and ablations for each component (GNP-IQ, ISMoE, interaction) against multiple METL baselines to substantiate the cross-task claim.
minor comments (2)
- [Abstract] Abstract: Lacks any quantitative highlights or task names despite claiming 'extensive experiments'; this reduces immediate clarity and is atypical for the venue.
- [Method] Notation: Define 'salient features' and the precise interaction mechanism in ISMoE more formally (e.g., via an equation for expert gating) to avoid ambiguity with standard MoE routing.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We have revised the manuscript to provide the requested quantitative evidence and expanded experimental results, which we believe strengthen the presentation of MP-ISMoE.
read point-by-point responses
-
Referee: [Method (GNP-IQ and ISMoE subsections)] § on GNP-IQ (likely §3.1 or equivalent): The central claim that GNP-IQ frees sufficient memory for ISMoE scaling without capacity loss or routing instability requires explicit evidence. The manuscript must quantify the error reduction (e.g., via per-layer MSE or gradient norm comparisons) of the Gaussian perturbation step versus plain iterative quantization, and include an ablation isolating whether gains derive from the interaction mechanism rather than simply more quantized parameters.
Authors: We thank the referee for this observation. The original manuscript describes the motivation and mechanism of GNP-IQ but does not include the specific quantitative comparisons requested. In the revised version we have added per-layer MSE and gradient-norm comparisons between GNP-IQ and plain iterative quantization to document the error reduction. We have also inserted an ablation that holds the total parameter budget fixed (by using the memory saved from quantization) and compares the full interactive ISMoE against a non-interactive MoE variant; the results indicate that the performance lift is attributable to the interaction with salient backbone features rather than parameter count alone. These additions directly address the concerns about capacity loss and routing stability. revision: yes
-
Referee: [Experiments] Experiments section (likely §4): The abstract asserts 'remarkably promotes accuracy' across diverse tasks, but the provided summary supplies no numerical results, baseline tables, or statistical significance tests. Full results must report exact accuracy deltas, parameter/memory footprints, and ablations for each component (GNP-IQ, ISMoE, interaction) against multiple METL baselines to substantiate the cross-task claim.
Authors: We agree that the experimental claims require fuller numerical support. Although the original manuscript contains result tables, we have substantially expanded the Experiments section in the revision. The updated version now includes complete tables with exact accuracy values, per-task deltas relative to all listed METL baselines, and measured parameter and memory footprints. Statistical significance tests have been added for the main comparisons. Separate ablations for GNP-IQ, ISMoE scaling, and the interaction mechanism are presented, each evaluated against multiple baselines on both vision-language and language-only tasks. These revisions provide the concrete evidence needed to substantiate the cross-task accuracy improvements. revision: yes
Circularity Check
No circularity: empirical method proposal with external priors
full rationale
The paper introduces GNP-IQ quantization and ISMoE routing as algorithmic innovations on top of existing PETL/METL literature, then validates via experiments. No equations, derivations, or self-citations appear in the abstract or description that reduce any claimed result to a fitted input or prior self-result by construction. Performance claims are empirical rather than predictive reductions, and the framework is presented as building on independent prior work without load-bearing uniqueness theorems or ansatzes imported from the same authors.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Proceedings of the International Conference on Learning Representations , year =
Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil , title =. Proceedings of the International Conference on Learning Representations , year =
-
[2]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =
Fang, Yuxin and Wang, Wen and Xie, Binhui and Sun, Quan and Wu, Ledell and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2023 , type =
2023
-
[3]
arXiv preprint arXiv:.11692 , volume =
Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , title =. arXiv preprint arXiv:.11692 , volume =. 2019 , type =
2019
-
[4]
North American Chapter of the Association for Computational Linguistics , pages =
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , title =. North American Chapter of the Association for Computational Linguistics , pages =. 2019 , type =
2019
-
[5]
Journal of Machine Learning Research , volume =
Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi , title =. Journal of Machine Learning Research , volume =. 2020 , type =
2020
-
[6]
Proceedings of the IEEE International Conference on Computer Vision , pages =
Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining , title =. Proceedings of the IEEE International Conference on Computer Vision , pages =. 2021 , type =
2021
-
[7]
and El-Nouby, A
Girdhar, R. and El-Nouby, A. and Liu, Z. and Singh, M. and Alwala, K. V. and Joulin, A. and Misra, I. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2023 , type =
2023
-
[8]
Proceedings of the International Conference on Machine Learning , pages =
Li, Junnan and Li, Dongxu and Xiong, Caiming and Hoi, Steven , title =. Proceedings of the International Conference on Machine Learning , pages =. 2022 , type =
2022
-
[9]
Proceedings of the International Conference on Machine Learning , pages =
Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack , title =. Proceedings of the International Conference on Machine Learning , pages =. 2021 , type =
2021
-
[10]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Han, Zeyu and Gao, Chao and Liu, Jinyang and Zhang, Jeff and Zhang, Sai Qian , title =. arXiv preprint arXiv:2403.14608 , year =
work page internal anchor Pith review arXiv
-
[11]
Proceedings of the Advances in Neural Information Processing Systems , pages =
Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song , title =. Proceedings of the Advances in Neural Information Processing Systems , pages =. 2020 , type =
2020
-
[12]
Annual Meeting of the Association for Computational Linguistics , pages =
Zaken, Elad Ben and Ravfogel, Shauli and Goldberg, Yoav , title =. Annual Meeting of the Association for Computational Linguistics , pages =. 2022 , type =
2022
-
[13]
Proceedings of the International Conference on Learning Representations , year =
Kim, Konwoo and Laskin, Michael and Mordatch, Igor and Pathak, Deepak , title =. Proceedings of the International Conference on Learning Representations , year =
-
[14]
Proceedings of the European Conference on Computer Vision , pages =
Touvron, Hugo and Cord, Matthieu and El-Nouby, Alaaeldin and Verbeek, Jakob and Jégou, Hervé , title =. Proceedings of the European Conference on Computer Vision , pages =. 2022 , type =
2022
-
[15]
Proceedings of the Conference on Empirical Methods in Natural Language Processing , year =
Zhao, Mengjie and Lin, Tao and Mi, Fei and Jaggi, Martin and Schütze, Hinrich , title =. Proceedings of the Conference on Empirical Methods in Natural Language Processing , year =
-
[16]
Proceedings of the International Conference on Machine Learning , pages =
Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and Laroussilhe, Quentin De and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain , title =. Proceedings of the International Conference on Machine Learning , pages =. 2019 , type =
2019
-
[17]
Proceedings of the International Conference on Learning Representations , year =
He, Junxian and Zhou, Chunting and Ma, Xuezhe and Berg-Kirkpatrick, Taylor and Neubig, Graham , title =. Proceedings of the International Conference on Learning Representations , year =
-
[18]
Proceedings of the Conference on Empirical Methodsin Natural Language Processing , pages =
Zhu, Yaoming and Feng, Jiangtao and Zhao, Chengqi and Wang, Mingxuan and Li, Lei , title =. Proceedings of the Conference on Empirical Methodsin Natural Language Processing , pages =. 2021 , type =
2021
-
[19]
Proceedings of the Conference on Empirical Methodsin Natural Language Processing , pages =
He, Shwai and Ding, Liang and Dong, Daize and Zhang, Miao and Tao, Dacheng , title =. Proceedings of the Conference on Empirical Methodsin Natural Language Processing , pages =. 2022 , type =
2022
-
[20]
Proceedings of the Advances in Neural Information Processing Systems , pages =
Karimi Mahabadi, Rabeeh and Henderson, James and Ruder, Sebastian , title =. Proceedings of the Advances in Neural Information Processing Systems , pages =. 2021 , type =
2021
-
[21]
Proceedings of the IEEE International Conference on Computer Vision , pages =
He, Haoyu and Cai, Jianfei and Zhang, Jing and Tao, Dacheng and Zhuang, Bohan , title =. Proceedings of the IEEE International Conference on Computer Vision , pages =. 2023 , type =
2023
-
[22]
Liu, Xiao and Zheng, Yanan and Du, Zhengxiao and Ding, Ming and Qian, Yujie and Yang, Zhilin and Tang, Jie , title =. arXiv:2103.10385 , year =
-
[23]
Proceedings of the Conference on Empirical Methods in Natural Language Processing , pages =
Lester, Brian and Al-Rfou, Rami and Constant, Noah , title =. Proceedings of the Conference on Empirical Methods in Natural Language Processing , pages =. 2021 , type =
2021
-
[24]
Annual Meeting of the Association for Computational Linguistics , pages =
Li, Xiang Lisa and Liang, Percy , title =. Annual Meeting of the Association for Computational Linguistics , pages =. 2021 , type =
2021
-
[25]
Annual Meeting of the Association for Computational Linguistics , year =
Vu, Tu and Lester, Brian and Constant, Noah and Al-Rfou, Rami and Cer, Daniel , title =. Annual Meeting of the Association for Computational Linguistics , year =
-
[26]
Annual Meeting of the Association for Computational Linguistics , year =
Hambardzumyan, Karen and Khachatrian, Hrant and May, Jonathan , title =. Annual Meeting of the Association for Computational Linguistics , year =
-
[27]
Annual Meeting of the Association for Computational Linguistics , year =
Qin, Yujia and Wang, Xiaozhi and Su, Yusheng and Lin, Yankai and Ding, Ning and Yi, Jing and Chen, Weize and Liu, Zhiyuan and Li, Juanzi and Hou, Lei , title =. Annual Meeting of the Association for Computational Linguistics , year =
-
[28]
arXiv preprint arXiv:.13255 , year =
Aghajanyan, Armen and Zettlemoyer, Luke and Gupta, Sonal , title =. arXiv preprint arXiv:.13255 , year =
-
[29]
Proceedings of the International Conference on Learning Representations , year =
Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. Proceedings of the International Conference on Learning Representations , year =
-
[30]
Proceedings of the AAAI Conference on Artificial Intelligence , pages =
He, Xuehai and Li, Chunyuan and Zhang, Pengchuan and Yang, Jianwei and Wang, Xin Eric , title =. Proceedings of the AAAI Conference on Artificial Intelligence , pages =. 2023 , type =
2023
-
[31]
Proceedings of the International Conference on Learning Representations , year =
Song, Haobo and Zhao, Hao and Majumder, Soumajit and Lin, Tao , title =. Proceedings of the International Conference on Learning Representations , year =
-
[32]
arXiv preprint arXiv:.09610 , year =
Lin, Yang and Ma, Xinyu and Chu, Xu and Jin, Yujie and Yang, Zhibang and Wang, Yasha and Mei, Hong , title =. arXiv preprint arXiv:.09610 , year =
-
[33]
arXiv preprint arXiv:.13628 , year =
Wu, Xun and Huang, Shaohan and Wei, Furu , title =. arXiv preprint arXiv:.13628 , year =
-
[34]
Proceedings of the European Conference on Computer Vision , pages =
Zhang, Jeffrey O and Sax, Alexander and Zamir, Amir and Guibas, Leonidas and Malik, Jitendra , title =. Proceedings of the European Conference on Computer Vision , pages =. 2020 , type =
2020
-
[35]
Proceedings of the Advances in Neural Information Processing Systems , pages =
Sung, Yi-Lin and Cho, Jaemin and Bansal, Mohit , title =. Proceedings of the Advances in Neural Information Processing Systems , pages =. 2022 , type =
2022
-
[36]
Proceedings of the Advances in Neural Information Processing Systems , year =
Jiang, Zeyinzi and Mao, Chaojie and Huang, Ziyuan and Ma, Ao and Lv, Yiliang and Shen, Yujun and Zhao, Deli and Zhou, Jingren , title =. Proceedings of the Advances in Neural Information Processing Systems , year =
-
[37]
Proceedings of the Advances in Neural Information Processing Systems , year =
Liao, Baohao and Tan, Shaomu and Monz, Christof , title =. Proceedings of the Advances in Neural Information Processing Systems , year =
-
[38]
Proceedings of the Advances in Neural Information Processing Systems , year =
Gomez, Aidan N and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B , title =. Proceedings of the Advances in Neural Information Processing Systems , year =
-
[39]
arXiv preprint arXiv:.03303 , year =
Zhang, Longteng and Zhang, Lin and Shi, Shaohuai and Chu, Xiaowen and Li, Bo , title =. arXiv preprint arXiv:.03303 , year =
-
[40]
Proceedings of the International Conference on Machine Learning , pages =
Phang, Jason and Mao, Yi and He, Pengcheng and Chen, Weizhu , title =. Proceedings of the International Conference on Machine Learning , pages =. 2023 , type =
2023
-
[41]
Proceedings of the Advances in Neural Information Processing Systems , pages =
Malladi, Sadhika and Gao, Tianyu and Nichani, Eshaan and Damian, Alex and Lee, Jason D and Chen, Danqi and Arora, Sanjeev , title =. Proceedings of the Advances in Neural Information Processing Systems , pages =. 2023 , type =
2023
-
[42]
arXiv preprint arXiv:.02531 , year =
Hinton, Geoffrey , title =. arXiv preprint arXiv:.02531 , year =
-
[43]
Proceedings of the International Conference on Learning Representations , year =
Romero, Adriana and Ballas, Nicolas and Kahou, Samira Ebrahimi and Chassang, Antoine and Gatta, Carlo and Bengio, Yoshua , title =. Proceedings of the International Conference on Learning Representations , year =
-
[44]
Proceedings of the AAAI Conference on Artificial Intelligence , pages =
Chen, Defang and Mei, Jian-Ping and Zhang, Yuan and Wang, Can and Wang, Zhe and Feng, Yan and Chen, Chun , title =. Proceedings of the AAAI Conference on Artificial Intelligence , pages =. 2021 , type =
2021
-
[45]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =
Yim, Junho and Joo, Donggyu and Bae, Jihoon and Kim, Junmo , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2017 , type =
2017
-
[46]
IEEE Transactions on Neural Networks Learning Systems , volume =
Chen, Hanting and Wang, Yunhe and Xu, Chang and Xu, Chao and Tao, Dacheng , title =. IEEE Transactions on Neural Networks Learning Systems , volume =. 2021 , type =
2021
-
[47]
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing , pages =
Meng, Zhong and Li, Jinyu and Zhao, Yong and Gong, Yifan , title =. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing , pages =. 2019 , type =
2019
-
[48]
Proceedings of the European Conference on Computer Vision , pages =
Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu , title =. Proceedings of the European Conference on Computer Vision , pages =. 2022 , type =
2022
-
[49]
Proceedings of the Advances in neural information processing systems , pages=
Flamingo: a visual language model for few-shot learning , author=. Proceedings of the Advances in neural information processing systems , pages=
-
[50]
Unveiling encoder-free vision-language models.arXiv preprint arXiv:2406.11832, 2024
Unveiling Encoder-Free Vision-Language Models , author=. arXiv preprint arXiv:2406.11832 , year=
-
[51]
Proceedings of the International Conference on Machine Learning , pages=
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models , author=. Proceedings of the International Conference on Machine Learning , pages=
-
[52]
Parameter-efficient transfer learning with diff pruning,
Parameter-efficient transfer learning with diff pruning , author=. arXiv preprint arXiv:2012.07463 , year=
-
[53]
Proceedings of the European Conference on Computer Vision , pages=
Visual prompt tuning , author=. Proceedings of the European Conference on Computer Vision , pages=
-
[54]
LoRA: Low-Rank Adaptation of Large Language Models
Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[55]
Proceedings of the International Conference on Machine Learning , pages=
Parameter-efficient transfer learning for NLP , author=. Proceedings of the International Conference on Machine Learning , pages=
-
[56]
Frontiers of Computer Science , volume=
Y-tuning: An efficient tuning paradigm for large-scale pre-trained models via label representation learning , author=. Frontiers of Computer Science , volume=. 2024 , publisher=
2024
-
[57]
Vision transformer adapter for dense predictions,
Vision transformer adapter for dense predictions , author=. arXiv preprint arXiv:2205.08534 , year=
-
[58]
Proceedings of the Advances in Neural Information Processing Systems , pages=
Adaptformer: Adapting vision transformers for scalable visual recognition , author=. Proceedings of the Advances in Neural Information Processing Systems , pages=
-
[59]
International Journal of Computer Vision , volume=
Learning to prompt for vision-language models , author=. International Journal of Computer Vision , volume=. 2022 , publisher=
2022
-
[60]
Proceedings of the Advances in Neural Information Processing Systems , pages=
Training neural networks with fixed sparse masks , author=. Proceedings of the Advances in Neural Information Processing Systems , pages=
-
[61]
Generative pretraining in mul- timodality
Generative pretraining in multimodality , author=. arXiv preprint arXiv:2307.05222 , year=
-
[62]
arXiv preprint arXiv:2208.10442 , year=
Image as a foreign language: Beit pretraining for all vision and vision-language tasks , author=. arXiv preprint arXiv:2208.10442 , year=
-
[63]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Masked autoencoders are scalable vision learners , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[64]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
and Wang, Z
Bai, Y. and Wang, Z. and Xiao, J. and Wei, C. and Wang, H. and Yuille, A. and Zhou, Y. and Xie, C. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2023 , type =
2023
-
[66]
and Wan, B
Diao, H. and Wan, B. and Zhang, Y. and Jia, X. and Lu, H. and Chen, L. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2024 , type =
2024
-
[67]
Proceedings of the European Conference on Computer Vision , pages =
Diao, Haiwen and Wan, Bo and Jia, Xu and Zhuge, Yunzhi and Zhang, Ying and Lu, Huchuan and Chen, Long , title =. Proceedings of the European Conference on Computer Vision , pages =. 2024 , type =
2024
-
[68]
A large-scale study of representation learning with the visual task adaptation benchmark , author=. arXiv preprint arXiv:1910.04867 , year=
-
[69]
Proceedings of the European Conference on Computer Vision , pages=
Microsoft coco: Common objects in context , author=. Proceedings of the European Conference on Computer Vision , pages=. 2014 , type =
2014
-
[70]
Transactions of the Association for Computational Linguistics , volume=
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , author=. Transactions of the Association for Computational Linguistics , volume=. 2014 , type =
2014
-
[71]
Annual Meeting of the Association for Computational Linguistics , pages=
Collecting highly parallel data for paraphrase evaluation , author=. Annual Meeting of the Association for Computational Linguistics , pages=. 2011 , type =
2011
-
[72]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Msr-vtt: A large video description dataset for bridging video and language , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=. 2016 , type =
2016
-
[73]
Proceedings of the European Conference on Computer Vision , pages=
Modeling context in referring expressions , author=. Proceedings of the European Conference on Computer Vision , pages=. 2016 , type =
2016
-
[74]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Generation and comprehension of unambiguous object descriptions , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[75]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Learning the best pooling strategy for visual semantic embedding , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=. 2021 , type =
2021
-
[76]
Clip4clip: An empirical study of CLIP for end to end video clip retrieval.CoRR, abs/2104.08860, 2021
Clip4clip: An empirical study of clip for end to end video clip retrieval , author=. arXiv preprint arXiv:2104.08860 , year=
-
[77]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Aggregated residual transformations for deep neural networks , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=. 2017 , type =
2017
-
[78]
OpenAI blog , volume=
Language models are unsupervised multitask learners , author=. OpenAI blog , volume=. 2019 , type =
2019
-
[79]
IEEE Transactions on Signal Processing , volume=
Bidirectional recurrent neural networks , author=. IEEE Transactions on Signal Processing , volume=. 1997 , publisher=
1997
-
[80]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Bottom-up and top-down attention for image captioning and visual question answering , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=. 2018 , type =
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.