A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
Ldc-mtl: Balancing multi-task learning through scalable loss discrepancy control.arXiv preprint arXiv:2502.08585
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
APT augments multi-task learning by adapting advanced optimizers via momentum balancing and light direction preservation, delivering performance gains on four standard MTL datasets.
citing papers explorer
-
When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining
A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
-
Delve into the Applicability of Advanced Optimizers for Multi-Task Learning
APT augments multi-task learning by adapting advanced optimizers via momentum balancing and light direction preservation, delivering performance gains on four standard MTL datasets.