Parameter-Efficient Multi-Task Learning via Progressive Task-Specific Adaptation
Pith reviewed 2026-05-18 13:37 UTC · model grok-4.3
The pith
Progressive task-specific adaptation shares adapter modules early and specializes them later to enable efficient multi-task learning with reduced interference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By making adapter modules progressively more task-specific from early to late layers and using gradient-based similarity to allocate shared modules to similar tasks, the approach mitigates task interference and negative transfer in multi-task parameter-efficient fine-tuning, leading to better results with reduced parameter counts on semantic segmentation and depth estimation tasks.
What carries the argument
Progressive task-specific adaptation, where shared adapters in early layers transition to task-specific ones in deeper layers, combined with gradient-based task similarity for module allocation.
If this is right
- Outperforms prior parameter-efficient multi-task methods on PASCAL and NYUD-v2 datasets.
- Requires fewer trainable parameters than competing approaches.
- Reduces task interference and negative transfer through similarity-based sharing of adapters.
- Works effectively when applied to Swin and Pyramid Vision Transformers.
Where Pith is reading between the lines
- The gradient-based task similarity idea might transfer to other parameter-efficient methods such as LoRA or prefix tuning.
- Early-layer sharing could help scale multi-task training to larger numbers of tasks without linear growth in parameters.
- The progressive design might generalize to other backbone architectures beyond the vision transformers tested here.
Load-bearing premise
The assumption that gradient-based task similarity computation can reliably allocate similar tasks to shared adapter modules to reduce task interference and negative transfer.
What would settle it
If replacing the gradient-based task allocation with random grouping leads to similar or better performance on the tested datasets, or if the proposed method fails to show gains over baselines while using fewer parameters.
Figures
read the original abstract
Parameter-efficient fine-tuning methods have emerged as a promising solution for adapting pre-trained models to various downstream tasks. While these methods perform well in single-task learning, extending them to multi-task learning exacerbates common issues, such as task interference and negative transfer, due to the limited number of trainable parameters. To address these challenges, we introduce progressive task-specific multi-task adaptation, a novel parameter-efficient approach for multi-task learning. Our approach introduces adapter modules that are shared in early layers and become increasingly task-specific in later layers. Additionally, we propose a gradient-based approach for computing task similarity and use this measure to allocate similar tasks to the shared adapter modules. To evaluate our approach, we adapt Swin and Pyramid Vision Transformers on PASCAL and NYUD-v2. On both datasets, our approach outperforms prior parameter-efficient multi-task methods while using fewer trainable parameters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces progressive task-specific multi-task adaptation for parameter-efficient fine-tuning of vision transformers. Adapter modules are shared across tasks in early layers and become progressively more task-specific in deeper layers; a gradient-based task similarity metric is used to allocate similar tasks to the same shared adapters. The method is evaluated by adapting Swin and Pyramid Vision Transformers to PASCAL and NYUD-v2, with the central claim that it outperforms prior parameter-efficient multi-task baselines while using fewer trainable parameters.
Significance. If the empirical claims hold after proper validation, the work would offer a practical advance in parameter-efficient multi-task learning by explicitly targeting task interference through a progressive sharing schedule and similarity-driven allocation. The design is novel relative to standard adapter or LoRA baselines and could influence efficient multi-task adaptation pipelines in computer vision.
major comments (3)
- [§3.3] §3.3 (Gradient-based Task Similarity): The central claim that gradient-based similarity reliably groups tasks to reduce negative transfer lacks isolating ablations. The reported gains could arise from the increased task-specific capacity in later layers rather than the similarity mechanism; without an ablation that disables the similarity allocation while keeping the progressive structure, the load-bearing assumption remains untested.
- [§4.1, Table 2] §4.1 and Table 2: The outperformance statement on PASCAL and NYUD-v2 is presented without error bars, multiple random seeds, or statistical significance tests. Given that the abstract already omits all quantitative numbers, the tables must demonstrate that the reported margins are robust and not sensitive to initialization or training order.
- [§3.2] §3.2 (Progressive Adapter Allocation): The description of how task similarity is computed from gradients during adaptation does not address potential dominance by high-magnitude tasks or sensitivity to the order in which tasks are presented; a concrete test (e.g., permuting task order or using different initializations) is needed to support the reliability claim.
minor comments (2)
- [§3] The notation for the task similarity score (presumably Eq. (3) or (4)) should be defined before its first use in the method section to avoid forward references.
- [Figure 2] Figure 2 (architecture diagram) would benefit from explicit labels indicating which layers are shared versus task-specific and how the gradient similarity is injected.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and detailed comments on our manuscript. We address each major point below and describe the revisions we will make to strengthen the empirical validation and clarity of our method.
read point-by-point responses
-
Referee: [§3.3] §3.3 (Gradient-based Task Similarity): The central claim that gradient-based similarity reliably groups tasks to reduce negative transfer lacks isolating ablations. The reported gains could arise from the increased task-specific capacity in later layers rather than the similarity mechanism; without an ablation that disables the similarity allocation while keeping the progressive structure, the load-bearing assumption remains untested.
Authors: We agree that an isolating ablation is necessary to separate the contribution of the similarity-based allocation from the progressive capacity increase. In the revised manuscript we will add a controlled ablation that retains the progressive sharing schedule but replaces the gradient-based allocation with random assignment of tasks to shared adapters. We will report the resulting performance drop on both PASCAL and NYUD-v2 to quantify the benefit attributable to the similarity mechanism. revision: yes
-
Referee: [§4.1, Table 2] §4.1 and Table 2: The outperformance statement on PASCAL and NYUD-v2 is presented without error bars, multiple random seeds, or statistical significance tests. Given that the abstract already omits all quantitative numbers, the tables must demonstrate that the reported margins are robust and not sensitive to initialization or training order.
Authors: We acknowledge the importance of statistical robustness. In the revision we will rerun all main experiments with three independent random seeds, report mean and standard deviation in Table 2, and include paired t-tests or Wilcoxon tests to establish statistical significance of the observed improvements over the strongest baselines. revision: yes
-
Referee: [§3.2] §3.2 (Progressive Adapter Allocation): The description of how task similarity is computed from gradients during adaptation does not address potential dominance by high-magnitude tasks or sensitivity to the order in which tasks are presented; a concrete test (e.g., permuting task order or using different initializations) is needed to support the reliability claim.
Authors: We will expand §3.2 to clarify that gradient similarities are computed after per-task gradient normalization (dividing by the L2 norm of each task’s gradient) to reduce dominance by high-magnitude tasks. To demonstrate stability, we will add a new experiment that permutes task presentation order and repeats the similarity computation under two different adapter initializations, reporting both the resulting task groupings and downstream multi-task performance. revision: yes
Circularity Check
No circularity: novel progressive adaptation and gradient-based allocation evaluated empirically
full rationale
The paper proposes a new parameter-efficient multi-task method using progressively task-specific adapters (shared early, specific later) plus a gradient-based task similarity measure to group tasks. These design choices are presented as original contributions and validated through experiments on PASCAL and NYUD-v2 with Swin/PVT backbones, claiming fewer parameters and better performance than prior methods. No equations, self-citations, or fitted inputs are shown reducing the central claims to definitions or prior results by construction. The derivation chain consists of architectural decisions and empirical testing rather than tautological renaming or self-referential fitting.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Task interference and negative transfer are exacerbated in multi-task learning due to the limited number of trainable parameters.
invented entities (1)
-
progressive task-specific adapter modules
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
adapter modules are shared among all tasks in the early layers, and they become increasingly specific to a subset of tasks as we move toward task-specific decoders... gradient-based approach for computing task similarity
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use the notion of gradient conflicts from the MTL literature to compute the similarity between a pair of tasks... S(g,g') = S_cos(...)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Parameter-efficient Quantum Multi-task Learning
QMTL uses shared VQC encoding plus task-specific quantum ansatz heads to achieve linear parameter scaling with the number of tasks while matching or exceeding classical multi-task baselines on three benchmarks.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Task2vec: Task embedding for meta-learning
Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless C Fowlkes, Ste- fano Soatto, and Pietro Perona. Task2vec: Task embedding for meta-learning. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 6430–6439,
-
[3]
MT- LoRA: Low-rank adaptation approach for efficient multi- task learning
Ahmed Agiza, Marina Neseem, and Sherief Reda. MT- LoRA: Low-rank adaptation approach for efficient multi- task learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16196– 16205, 2024. 2, 3, 5, 6, 7, 8, 1
work page 2024
-
[4]
Sequential modeling enables scalable learn- ing for large vision models
Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan L Yuille, Trevor Darrell, Jitendra Malik, and Alexei A Efros. Sequential modeling enables scalable learn- ing for large vision models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22861–22872, 2024. 1
work page 2024
-
[5]
Fair resource allocation in multi-task learning
Hao Ban and Kaiyi Ji. Fair resource allocation in multi-task learning. InForty-first International Conference on Machine Learning, 2024. 3
work page 2024
-
[6]
Automated search for resource- efficient branched multi-task networks
David Br ¨uggemann, Menelaos Kanakis, Stamatios Geor- goulis, and Luc Van Gool. Automated search for resource- efficient branched multi-task networks. In31st British Machine Vision Conference 2020, BMVC 2020, page 359. BMV A Press, 2020. 2, 3
work page 2020
-
[7]
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022. 2
work page 2022
-
[8]
Detect what you can: Detecting and representing objects using holistic mod- els and body parts
Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fi- dler, Raquel Urtasun, and Alan Yuille. Detect what you can: Detecting and representing objects using holistic mod- els and body parts. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1971–1978,
work page 1971
-
[9]
Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and An- drew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InIn- ternational conference on machine learning, pages 794–803. PMLR, 2018. 3
work page 2018
-
[10]
Vision transformer adapter for dense predictions
Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions. InThe Eleventh International Conference on Learning Representations, 2023. 2
work page 2023
-
[11]
arXiv preprint arXiv:2009.09796 (2020)
Michael Crawshaw. Multi-task learning with deep neural networks: A survey.arXiv preprint arXiv:2009.09796, 2020. 3
-
[12]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88:303–338, 2010. 5
work page 2010
-
[13]
Chris Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, and Chelsea Finn. Efficiently identifying task groupings for multi-task learning.Advances in Neural Information Pro- cessing Systems, 34:27503–27516, 2021. 3, 5
work page 2021
-
[14]
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xi- angyu Yue, et al. Llama-adapter v2: Parameter-efficient vi- sual instruction model.arXiv preprint arXiv:2304.15010,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Learn- ing to branch for multi-task learning
Pengsheng Guo, Chen-Yu Lee, and Daniel Ulbricht. Learn- ing to branch for multi-task learning. InInternational confer- ence on machine learning, pages 3854–3863. PMLR, 2020. 2, 3
work page 2020
-
[16]
Lora+: Efficient low rank adaptation of large models
Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Efficient low rank adaptation of large models. InForty-first Interna- tional Conference on Machine Learning, 2024. 3
work page 2024
-
[17]
Parameter-efficient transfer learning for nlp
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019. 2, 3
work page 2019
-
[18]
LoRA: Low- rank adaptation of large language models
Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. LoRA: Low- rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. 2, 3, 5, 7, 8
work page 2022
-
[19]
Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models
Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Lee. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 5254–5276, 2023. 2
work page 2023
-
[20]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean Conference on Computer Vision, pages 709–727. Springer, 2022. 2, 3
work page 2022
-
[21]
Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics
Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491,
-
[22]
The power of scale for parameter-efficient prompt tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceed- ings of the 2021 Conference on Empirical Methods in Natu- ral Language Processing, pages 3045–3059, 2021. 2, 3
work page 2021
-
[23]
Prefix-tuning: Optimiz- ing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimiz- ing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computa- tional Linguistics and the 11th International Joint Confer- ence on Natural Language Processing (Volume 1: Long Pa- pers), pages 4582–4597, 2021. 2, 3
work page 2021
-
[24]
Yang Lin, Xinyu Ma, Xu Chu, Yujie Jin, Zhibang Yang, Yasha Wang, and Hong Mei. Lora dropout as a spar- sity regularizer for overfitting control.arXiv preprint arXiv:2404.09610, 2024. 3
-
[25]
Dora: Weight-decomposed low-rank adaptation
Shih-yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation. InForty-first International Conference on Ma- chine Learning, 2024. 2, 3
work page 2024
-
[26]
Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, and Zsolt Kira. Polyhistor: Parameter-efficient multi-task adap- tation for dense vision tasks.Advances in Neural Information Processing Systems, 35:36889–36901, 2022. 2, 3, 5, 6, 7
work page 2022
-
[27]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. 6
work page 2021
-
[28]
Decoupled weight de- cay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2017. 1
work page 2017
-
[29]
Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, and Rog´erio Schmidt Feris. Fully-adaptive fea- ture sharing in multi-task networks with applications in per- son attribute classification.2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 1131– 1140, 2016. 2, 3
work page 2017
-
[30]
Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks
Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa De- hghani, and James Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)...
work page 2021
-
[31]
Attentive single-tasking of multiple tasks
Kevis-Kokitsi Maninis, Ilija Radosavovic, and Iasonas Kokkinos. Attentive single-tasking of multiple tasks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1851–1860, 2019. 3
work page 2019
-
[32]
Adapterhub: A framework for adapting transformers.EMNLP 2020, page 46, 2020
Jonas Pfeiffer, Andreas R ¨uckl´e, Clifton Poth, Aishwarya Ka- math, Ivan Vulic, Sebastian Ruder, Kyunghyun Cho, and Iryna Gurevych. Adapterhub: A framework for adapting transformers.EMNLP 2020, page 46, 2020. 3
work page 2020
-
[33]
Adapterfusion: Non- destructive task composition for transfer learning
Jonas Pfeiffer, Aishwarya Kamath, Andreas R ¨uckl´e, Kyunghyun Cho, and Iryna Gurevych. Adapterfusion: Non- destructive task composition for transfer learning. InPro- ceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Vol- ume, pages 487–503, 2021. 2
work page 2021
-
[34]
Independent component alignment for multi-task learning
Dmitry Senushkin, Nikolay Patakin, Arseny Kuznetsov, and Anton Konushin. Independent component alignment for multi-task learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20083–20093, 2023. 3
work page 2023
-
[35]
Indoor segmentation and support inference from rgbd images
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. InComputer Vision–ECCV 2012: 12th Eu- ropean Conference on Computer Vision, Florence, Italy, Oc- tober 7-13, 2012, Proceedings, Part V 12, pages 746–760. Springer, 2012. 5
work page 2012
-
[36]
Yi-Lin Sung, Varun Nair, and Colin A Raffel. Training neu- ral networks with fixed sparse masks.Advances in Neural Information Processing Systems, 34:24193–24205, 2021. 2, 3
work page 2021
-
[37]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023. 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023. 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Branched multi-task networks: Deciding what layers to share.Proceedings BMVC 2020,
Simon Vandenhende, Stamatios Georgoulis, Bert De Bra- bandere, and Luc Van Gool. Branched multi-task networks: Deciding what layers to share.Proceedings BMVC 2020,
work page 2020
-
[40]
Mti-net: Multi-scale task interaction networks for multi-task learning
Simon Vandenhende, Stamatios Georgoulis, and Luc Van Gool. Mti-net: Multi-scale task interaction networks for multi-task learning. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 527–543. Springer, 2020. 5
work page 2020
-
[41]
Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc Van Gool. Multi-task learning for dense prediction tasks: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(7):3614–3633, 2021. 3, 5
work page 2021
-
[42]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution represen- tation learning for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3349– 3364, 2020. 6
work page 2020
-
[43]
Adamix: Mixture-of-adaptations for parameter-efficient model tuning
Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xi- aodong Liu, Jing Gao, Ahmed Hassan, and Jianfeng Gao. Adamix: Mixture-of-adaptations for parameter-efficient model tuning. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5744–5760, 2022. 2
work page 2022
-
[44]
Yi Xin, Siqi Luo, Haodi Zhou, Junlong Du, Xiaohong Liu, Yue Fan, Qing Li, and Yuntao Du. Parameter-efficient fine-tuning for pre-trained vision models: A survey.arXiv preprint arXiv:2402.02242, 2024. 2
-
[45]
Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. Pad-net: Multi-tasks guided prediction-and-distillation net- work for simultaneous depth estimation and scene parsing. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 675–684, 2018. 5
work page 2018
-
[46]
Inverted pyramid multi-task trans- former for dense scene understanding
Hanrong Ye and Dan Xu. Inverted pyramid multi-task trans- former for dense scene understanding. InEuropean Confer- ence on Computer Vision, pages 514–530. Springer, 2022. 5
work page 2022
-
[47]
Jun Yu, Yutong Dai, Xiaokang Liu, Jin Huang, Yishan Shen, Ke Zhang, Rong Zhou, Eashan Adhikarla, Wenx- uan Ye, Yixin Liu, et al. Unleashing the power of multi- task learning: A comprehensive survey spanning traditional, deep, and pretrained foundation model eras.arXiv preprint arXiv:2404.18961, 2024. 3
-
[48]
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning.Advances in Neural Information Pro- cessing Systems, 33:5824–5836, 2020. 2
work page 2020
-
[49]
Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InProceedings of the 60th An- nual Meeting of the Association for Computational Linguis- tics (Volume 2: Short Papers), pages 1–9, 2022. 3
work page 2022
-
[50]
Adaptive budget allocation for parameter-efficient fine- tuning
Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine- tuning. InThe Eleventh International Conference on Learn- ing Representations, 2023. 2, 3 Parameter-Efficient Multi-Task Learning via Progressive Task-Specific Adaptation Supplementary Material
work page 2023
-
[51]
[3] for fine-tuning the models on the PASCAL dataset
Training Hyperparameters PASCAL.We replicate the hyperparameters from Ag- iza et al. [3] for fine-tuning the models on the PASCAL dataset. Specifically, we use the AdamW optimizer [28] with a batch size of 32, a learning rate of 3.125×10 −5, and a weight decay of 0.05. The models are fine-tuned for 300 epochs, with evaluations every 20 epochs. We use a li...
-
[52]
Additional Results In this section, we present additional experiments on the PASCAL and NYUD-v2 datasets. Continuing from Sec- tion 4.5, Table 5 illustrates the performance of TGLoRA for varying trainable parameters. The rank of the low-rank modules in TGLoRA layers controls this number. The ta- ble also shows the performance of “Single Task – LoRA” and “...
-
[53]
Computational Budget vs Performance The tree structure offers a trade-off between model per- formance and inference cost. For instance, using 6.89M parameters for PASCAL-Context, and assigning one, one, two, and four task groups to the first through fourth stages, respectively, results in∆m= +3.93%with 38.37 GMacs. Similarly, configuring the stages with o...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.