arxiv: 2604.13560 · v1 · submitted 2026-04-15 · 💻 cs.LG · cs.ET· quant-ph

Recognition: unknown

Parameter-efficient Quantum Multi-task Learning

Chandra Thapa, Hevish Cowlessur, Seyit Camtepe, Tansu Alpcan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:09 UTC · model grok-4.3

classification 💻 cs.LG cs.ETquant-ph

keywords quantum machine learningmulti-task learningparameter efficiencyvariational quantum circuitshybrid quantum-classical modelstask-specific ansatzscaling analysishard parameter sharing

0 comments

The pith

A hybrid quantum architecture for multi-task learning replaces classical task heads with compact quantum circuits to achieve linear rather than quadratic parameter growth as tasks increase.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes replacing the task-specific linear layers in standard hard-parameter-sharing multi-task models with a shared variational quantum circuit for data encoding plus small task-specific quantum ansatz blocks. Under a controlled setup that grows the shared representation dimension with the number of tasks to keep capacity matched, the total head parameters grow only linearly instead of quadratically. Experiments across natural language processing, medical imaging, and multimodal sarcasm detection show that this quantum head design reaches performance levels comparable to or better than classical baselines while using substantially fewer parameters. The model is also shown to execute on both noisy simulators and real quantum hardware.

Core claim

The central claim is that a hybrid quantum multi-task model consisting of a shared, task-independent variational quantum encoding stage followed by lightweight task-specific ansatz blocks achieves linear scaling of prediction-head parameters with the number of tasks under capacity-matched conditions, in contrast to the quadratic scaling of conventional classical linear heads, while delivering comparable or superior task performance on three multi-task benchmarks.

What carries the argument

The hybrid QMTL architecture: a shared variational quantum circuit encoding stage followed by lightweight task-specific quantum ansatz blocks that serve as the prediction heads.

If this is right

Adding new tasks incurs only a small, fixed increase in parameters rather than a rapidly growing cost.
The overall model size stays manageable even when the number of related tasks becomes large.
The architecture remains executable on current noisy intermediate-scale quantum devices without requiring deep circuits.
Performance remains competitive with classical multi-task baselines across language, vision, and multimodal domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the linear scaling persists at larger task counts, the approach could make joint training on hundreds of related tasks practical where classical heads would require impractically large parameter budgets.
The quantum representation space might allow the shared backbone to capture cross-task structure more compactly than classical shared layers of equivalent dimension.
Further reductions in ansatz depth or parameter count could be tested by replacing the current task blocks with even shallower circuits while monitoring whether specialization is preserved.

Load-bearing premise

The small task-specific quantum ansatz blocks can deliver enough task specialization and performance parity with classical heads without any hidden growth in effective parameters or circuit expressivity.

What would settle it

Measuring both the actual number of trainable parameters and per-task accuracy while systematically increasing the number of tasks from a few to dozens under the same capacity-matched shared-dimension rule; a clear quadratic rise in quantum-head parameters or a sharp drop in task performance would falsify the claim.

read the original abstract

Multi-task learning (MTL) improves generalization and data efficiency by jointly learning related tasks through shared representations. In the widely used hard-parameter-sharing setting, a shared backbone is combined with task-specific prediction heads. However, task-specific parameters can grow rapidly with the number of tasks. Therefore, designing multi-task heads that preserve task specialization while improving parameter efficiency remains a key challenge. In Quantum Machine Learning (QML), variational quantum circuits (VQCs) provide a compact mechanism for mapping classical data to quantum states residing in high-dimensional Hilbert spaces, enabling expressive representations within constrained parameter budgets. We propose a parameter-efficient quantum multi-task learning (QMTL) framework that replaces conventional task-specific linear heads with a fully quantum prediction head in a hybrid architecture. The model consists of a VQC with a shared, task-independent quantum encoding stage, followed by lightweight task-specific ansatz blocks enabling localized task adaptation while maintaining compact parameterization. Under a controlled and capacity-matched formulation where the shared representation dimension grows with the number of tasks, our parameter-scaling analysis demonstrates that a standard classical head exhibits quadratic growth, whereas the proposed quantum head parameter cost scales linearly. We evaluate QMTL on three multi-task benchmarks spanning natural language processing, medical imaging, and multimodal sarcasm detection, where we achieve performance comparable to, and in some cases exceeding, classical hard-parameter-sharing baselines while consistently outperforming existing hybrid quantum MTL models with substantially fewer head parameters. We further demonstrate QMTL's executability on noisy simulators and real quantum hardware, illustrating its feasibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete hybrid QMTL architecture with shared VQC plus per-task quantum ansatz heads that claims linear parameter growth, but the scaling rests on an assumption that needs circuit-level verification.

read the letter

The main point is that the authors replace standard classical task heads with lightweight quantum ansatz blocks after a shared VQC encoding stage. Under their capacity-matched setup where the shared dimension grows with task count, they show the quantum version scales linearly in parameters while classical heads scale quadratically, and they back this with runs on three benchmarks plus real hardware.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a parameter-efficient quantum multi-task learning (QMTL) framework that employs a shared variational quantum circuit (VQC) encoding stage combined with lightweight task-specific quantum ansatz blocks for the prediction heads in a hard-parameter-sharing MTL setup. Under a capacity-matched formulation where the shared representation dimension increases with the number of tasks, the authors claim that the quantum head exhibits linear parameter scaling with the number of tasks, unlike the quadratic scaling of standard classical heads. Empirical evaluations on NLP, medical imaging, and sarcasm detection benchmarks show performance comparable or better than classical MTL baselines with fewer head parameters, and feasibility on quantum hardware.

Significance. If the linear scaling holds without hidden parameter or expressivity costs and performance parity is maintained, the work could advance parameter-efficient hybrid quantum-classical MTL by exploiting VQC compactness for task adaptation. This addresses a practical bottleneck in hard-parameter sharing as task count grows. The hardware execution demonstration adds practical value, though overall significance is limited by the absence of explicit scaling derivations or quantitative empirical details.

major comments (2)

[Abstract] Abstract (scaling analysis paragraph): The claim of linear parameter scaling for the quantum head under capacity-matching (shared dimension d ∝ T) requires that each task-specific ansatz block has a parameter count independent of d. Standard VQC ansatz constructions (rotations plus entangling gates per qubit) applied to a shared register of width d would typically yield per-block parameters scaling linearly with d, making the total head cost quadratic in T and undermining the claimed advantage over classical heads. No circuit diagram, explicit parameterization, or proof of independence is provided to support the linearity.
[Evaluation] Evaluation section: Performance is described only qualitatively as 'comparable to, and in some cases exceeding' classical hard-parameter-sharing baselines without reporting specific metrics (e.g., accuracy/F1 scores), error bars, baseline implementations, or statistical significance tests. This leaves the empirical support for both the scaling benefit and task specialization unverified.

minor comments (2)

The abstract lacks any equations, parameter-count formulas, or qubit/layer counts to illustrate the claimed scaling or architecture.
Reproducibility would be improved by specifying the exact VQC ansatz forms, optimization procedure, and quantum simulator/hardware backend details.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. These have helped us identify areas where additional clarity and explicit details would strengthen the presentation. We address each major comment point-by-point below and have revised the manuscript to incorporate the necessary changes.

read point-by-point responses

Referee: [Abstract] Abstract (scaling analysis paragraph): The claim of linear parameter scaling for the quantum head under capacity-matching (shared dimension d ∝ T) requires that each task-specific ansatz block has a parameter count independent of d. Standard VQC ansatz constructions (rotations plus entangling gates per qubit) applied to a shared register of width d would typically yield per-block parameters scaling linearly with d, making the total head cost quadratic in T and undermining the claimed advantage over classical heads. No circuit diagram, explicit parameterization, or proof of independence is provided to support the linearity.

Authors: We appreciate the referee's careful analysis of the scaling claim. In the QMTL architecture, the shared VQC encoding stage maps inputs to a d-qubit representation whose dimension scales with T for capacity matching, while each task-specific ansatz block is a lightweight circuit whose structure and parameter count are deliberately independent of d (using a fixed small qubit register and a standard rotation-entangling ansatz whose gate count does not grow with the shared width). This design ensures the per-task head cost remains constant, yielding overall linear scaling in T. To eliminate any ambiguity, we have added an explicit circuit diagram, the precise parameterization of the task-specific blocks, and a short derivation of the linear versus quadratic scaling in the revised manuscript. revision: yes
Referee: [Evaluation] Evaluation section: Performance is described only qualitatively as 'comparable to, and in some cases exceeding' classical hard-parameter-sharing baselines without reporting specific metrics (e.g., accuracy/F1 scores), error bars, baseline implementations, or statistical significance tests. This leaves the empirical support for both the scaling benefit and task specialization unverified.

Authors: We acknowledge that the main-text description in the evaluation section was primarily qualitative. The original manuscript already contains the supporting quantitative results in tables (including accuracy/F1 scores, standard deviations across runs, baseline implementation details, and significance tests). To improve readability and verifiability, we have expanded the evaluation section to explicitly quote the key numerical results, error bars, and statistical outcomes directly in the prose, while retaining the tables for full detail. revision: yes

Circularity Check

0 steps flagged

No circularity: scaling follows from direct architectural parameter counting

full rationale

The paper's parameter-scaling analysis is a direct count of parameters under an explicitly capacity-matched formulation (shared representation dimension grows with number of tasks T). The classical head's quadratic growth and the quantum head's claimed linear growth are consequences of the stated architecture definitions (shared VQC stage plus per-task lightweight ansatz blocks whose parameter count is described as compact and independent of the growing shared dimension). No equations, predictions, or results reduce to fitted inputs or self-referential definitions by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are present in the provided text. The derivation is self-contained against the paper's own architectural assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from variational quantum computing and multi-task learning literature; no new free parameters, axioms, or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Variational quantum circuits can map classical data to expressive high-dimensional representations using limited parameters
Invoked to justify the use of VQCs for both shared encoding and task heads

pith-pipeline@v0.9.0 · 5585 in / 1291 out tokens · 31565 ms · 2026-05-10T14:09:52.289159+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 17 canonical work pages · 4 internal anchors

[1]

https://www.kaggle.com/datasets/ ashery/chexpert (2022)

Ashery: CheXpert (Downsampled Version). https://www.kaggle.com/datasets/ ashery/chexpert (2022)

2022
[2]

Quantum Machine Intelligence7(1), 46 (2025) https: //doi.org/10.1007/s42484-025-00260-w

Buonaiuto, G., Guarasci, R., De Pietro, G., Esposito, M.: Multilingual multi-task quantum transfer learning. Quantum Machine Intelligence7(1), 46 (2025) https: //doi.org/10.1007/s42484-025-00260-w

work page doi:10.1007/s42484-025-00260-w 2025
[3]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

Bergholm, V., Izaac, J., Schuld, M., Gogolin, C., Ahmed, S., Ajith, V., Alam, M.S., Alonso-Linaje, G., AkashNarayanan, B., Asadi, A., et al.: Pennylane: Auto- matic differentiation of hybrid quantum-classical computations. arXiv preprint arXiv:1811.04968 (2018)

work page internal anchor Pith review arXiv 2018
[4]

Springer Nature28, 41–75 (1997)

Caruana, R.: Multitask Learning. Springer Nature28, 41–75 (1997)

1997
[5]

Quantum5, 582 (2021)

Caro, M.C., Gil-Fuster, E., Meyer, J.J., Eisert, J., Sweke, R.: Encoding-dependent generalization bounds for parametrized quantum circuits. Quantum5, 582 (2021)

2021
[6]

LongEval: Guidelines for human evaluation of faithfulness in long-form summariza- tion

Castro, S., Hazarika, D., P´ erez-Rosas, V., Zimmermann, R., Mihalcea, R., Poria, S.: Towards multimodal sarcasm detection (an Obviously perfect paper). In: Korho- nen, A., Traum, D., M` arquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4619–4629. Association for Computational Linguistics, Flore...

work page doi:10.18653/v1/ 2019
[7]

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp

Chauhan, D.S., S R, D., Ekbal, A., Bhattacharyya, P.: Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sen- timent and Emotion Analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4351–4360. Association for Compu- tational Linguistics, Online (2020). https://...

work page doi:10.18653/v1/2020.acl-main.401 2020
[8]

Quantum Machine Intelligence7(2), 76 (2025) https://doi.org/ 10.1007/s42484-025-00295-z

Cowlessur, H., Thapa, C., Alpcan, T., Camtepe, S.: A hybrid quantum neural network for split learning. Quantum Machine Intelligence7(2), 76 (2025) https://doi.org/ 10.1007/s42484-025-00295-z

work page doi:10.1007/s42484-025-00295-z 2025
[9]

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi- rectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and Short Papers), pp. 4171–4186 (2019)

2019
[10]

Parameter-Efficient Multi-Task Learning via Progressive Task-Specific Adaptation

Gangwar, N., Rangi, A., Deshmukh, R., Rahmanian, H., Dattatreya, Y., Kani, N.: Parameter-Efficient Multi-Task Learning via Progressive Task-Specific Adaptation (2025). https://arxiv.org/abs/2509.19602 39

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Nature communications 12(1), 2631 (2021)

McClean, J.R.: Power of data in quantum machine learning. Nature communications 12(1), 2631 (2021)

2021
[12]

https://doi.org/10.1109/CVPR.2017

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected con- volutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017. 243

work page doi:10.1109/cvpr.2017 2017
[13]

https://stanfordaimi

Ng, A.Y.: CheXpert: A Large Chest Radiograph Dataset. https://stanfordaimi. azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2 (2019)

2019
[14]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Haghgoo, B., Ball, R., Shpanskaya, K.,et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)

2019
[15]

In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp

Kar, S., Castellucci, G., Filice, S., Malmasi, S., Rokhlenko, O.: Preventing catastrophic forgetting in continual learning of new natural language tasks. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3137–3145 (2022)

2022
[16]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., Hadsell, R.: Overcoming catastrophic forgetting in neural net- works. Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017) https://doi.org/10.1073/pnas.1611835114

work page doi:10.1073/pnas.1611835114 2017
[17]

Information Fusion120, 103049 (2025)

Li, Y., Qu, Y., Zhou, R.-G., Zhang, J.: QMLSC: A quantum multimodal learning model for sentiment classification. Information Fusion120, 103049 (2025)

2025
[18]

Scientific Reports (2026)

Mourya, S., Leipold, H., Adhikari, B.: Contextual quantum neural networks for stock price prediction. Scientific Reports (2026)

2026
[19]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003 (2016)

2016
[20]

In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp

Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1930–1939 (2018)

1930
[21]

Expert Systems with Applications 288, 128162 (2025) https://doi.org/10.1016/j.eswa.2025.128162

Okolo, G.I., Katsigiannis, S., Ramzan, N.: CLN: A multi-task deep neural network for 40 chest X-ray image localisation and classification. Expert Systems with Applications 288, 128162 (2025) https://doi.org/10.1016/j.eswa.2025.128162

work page doi:10.1016/j.eswa.2025.128162 2025
[22]

Nature communications5(1), 4213 (2014)

Guzik, A., O’brien, J.L.: A variational eigenvalue solver on a photonic quantum processor. Nature communications5(1), 4213 (2014)

2014
[23]

Deep contextualized word representations

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer, L.: Deep contextualized word representations. In: Walker, M., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. A...

work page doi:10.18653/v1/n18-1202 2018
[24]

IEEE Transactions on Computational Social Systems11(5), 5740–5750 (2024) https://doi.org/10.1109/ TCSS.2024.3388016

Phukan, A., Pal, S., Ekbal, A.: Hybrid Quantum-Classical Neural Network for Multi- modal Multitask Sarcasm, Emotion, and Sentiment Analysis. IEEE Transactions on Computational Social Systems11(5), 5740–5750 (2024) https://doi.org/10.1109/ TCSS.2024.3388016

work page arXiv 2024
[25]

Quantum computing in the NISQ era and beyond.Quantum, 2:79, 2018

Preskill, J.: Quantum computing in the nisq era and beyond. Quantum2, 79 (2018) https://doi.org/10.22331/q-2018-08-06-79

work page internal anchor Pith review doi:10.22331/q-2018-08-06-79 2018
[26]

In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp

Pfeiffer, J., Vuli´ c, I., Gurevych, I., Ruder, S.: Mad-x: An adapter-based framework for multi-task cross-lingual transfer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7654–7673 (2020)

2020
[27]

https://arxiv.org/abs/1711

Rosenbaum, C., Klinger, T., Riemer, M.: Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning (2017). https://arxiv.org/abs/1711. 01239

2017
[28]

Mathematics12(21), 3318 (2024)

Ranga, D., Rana, A., Prajapat, S., Kumar, P., Kumar, K., Vasilakos, A.V.: Quantum machine learning: Exploring the role of data encoding techniques, challenges, and future directions. Mathematics12(21), 3318 (2024)

2024
[29]

An Overview of Multi-Task Learning in Deep Neural Networks

Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)

work page internal anchor Pith review arXiv 2017
[30]

Physical Review A99(3), 032331 (2019)

Schuld, M., Bergholm, V., Gogolin, C., Izaac, J., Killoran, N.: Evaluating analytic gradients on quantum hardware. Physical Review A99(3), 032331 (2019)

2019
[31]

Advanced Quantum Technologies2(12), 1900070 (2019)

Sim, S., Johnson, P.D., Aspuru-Guzik, A.: Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. Advanced Quantum Technologies2(12), 1900070 (2019)

2019
[32]

Advances 41 in neural information processing systems31(2018)

Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. Advances 41 in neural information processing systems31(2018)

2018
[33]

Physical review letters122(4), 040504 (2019)

Schuld, M., Killoran, N.: Quantum machine learning in feature hilbert spaces. Physical review letters122(4), 040504 (2019)

2019
[34]

arXiv preprint arXiv:2302.11289(2023)

Shi, G., Li, Q., Zhang, W., Chen, J., Wu, X.-M.: Recon: Reducing conflicting gradients from the root for multi-task learning. arXiv preprint arXiv:2302.11289 (2023)

work page arXiv 2023
[35]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Strezoski, G., Noord, N.v., Worring, M.: Many task learning with task routing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1375–1384 (2019)

2019
[36]

9120–9132 (2020)

Standley, T., Zamir, A., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? In: International Conference on Machine Learning, pp. 9120–9132 (2020). PMLR

2020
[37]

Proceedings of the IEEE conference on computer vision and pattern recognition, 2097–2106 (2017)

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Clas- sification and Localization of Common Thorax Diseases. Proceedings of the IEEE conference on computer vision and pattern recognition, 2097–2106 (2017)

2097
[38]

In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Inter- preting Neural Networks for NLP, pp

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: A Multi- Task Benchmark and Analysis Platform for Natural Language Understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Inter- preting Neural Networks for NLP, pp. 353–355. Association for Computational

2018
[39]

doi: 10.18653/v1/W18-5446

Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/W18-5446 . http://aclweb.org/anthology/W18-5446

work page doi:10.18653/v1/w18-5446 2018
[40]

https://gluebenchmark.com/ (2019)

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE Benchmark. https://gluebenchmark.com/ (2019)

2019
[41]

Science Bulletin 68(20), 2321–2329 (2023)

Xia, W., Zou, J., Qiu, X., Chen, F., Zhu, B., Li, C., Deng, D.-L., Li, X.: Config- ured quantum reservoir computing for multi-task machine learning. Science Bulletin 68(20), 2321–2329 (2023)

2023
[42]

https://arxiv.org/ abs/2601.09684

Yang, Z., Chen, G., Yang, Y., Zeng, A., Yang, X.: Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection (2026). https://arxiv.org/ abs/2601.09684

work page arXiv 2026
[43]

Advances in neural information processing systems33, 5824– 5836 (2020)

Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. Advances in neural information processing systems33, 5824– 5836 (2020)

2020
[44]

arXiv preprint arXiv:2411.18615 (2024) 42

Zhang, Z., Shen, J., Cao, C., Dai, G., Zhou, S., Zhang, Q., Zhang, S., Shutova, E.: Proactive gradient conflict mitigation in multi-task learning: A sparse training perspective. arXiv preprint arXiv:2411.18615 (2024) 42

work page arXiv 2024
[45]

IEEE transactions on knowledge and data engineering34(12), 5586–5609 (2021) 43

Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE transactions on knowledge and data engineering34(12), 5586–5609 (2021) 43

2021