pith. sign in

arxiv: 1810.04650 · v2 · pith:ZLTGQZ2Pnew · submitted 2018-10-10 · 💻 cs.LG · stat.ML

Multi-Task Learning as Multi-Objective Optimization

classification 💻 cs.LG stat.ML
keywords learningmulti-taskmulti-objectivetasksoptimizationalgorithmsboundclassification
0
0 comments X
read the original abstract

In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery

    cs.CV 2026-05 unverdicted novelty 7.0

    SkyPart uses learnable prototypes for patch grouping, altitude modulation only in training, graph-attention readout, and Kendall-weighted loss to set new state-of-the-art single-pass performance on SUES-200, Universit...

  2. Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery

    cs.CV 2026-05 unverdicted novelty 6.0

    SkyPart achieves state-of-the-art single-pass cross-view geo-localization on SUES-200, University-1652, and DenseUAV by using prototype-based part discovery, altitude-conditioned modulation, and Kendall-weighted loss,...

  3. Toward a unified data-driven turbulence model through multi-objective learning

    physics.flu-dyn 2025-09 unverdicted novelty 6.0

    A data-driven framework learns a unified, frame-invariant turbulence model from sparse observations across regimes via multi-objective ensemble learning and similarity-based case selection.