Learning to Recover Task Experts from a Multi-Task Merged Model

Jinwook Jung; Kumju Jo; Sungyong Baik; Taegyu Kim

arxiv: 2606.26902 · v1 · pith:5RA3KR23new · submitted 2026-06-25 · 💻 cs.AI

Learning to Recover Task Experts from a Multi-Task Merged Model

Jinwook Jung , Taegyu Kim , Kumju Jo , Sungyong Baik This is my paper

Pith reviewed 2026-06-26 04:58 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-task model mergingparameter interferencetask expert recoveryoffset predictionSVD subspace signaturesgeneralization to unseen tasksadaptive interpolation

0 comments

The pith

By predicting additive offsets from a merged checkpoint, ReTeX recovers over 95 percent of each task expert's original performance from a single model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multi-task merging combines separate expert models into one but creates interference that degrades individual task performance. The paper shows this interference can be treated as predictable additive offsets on the original parameters. ReTeX trains a predictor to estimate those offsets and a router-free identifier that matches inputs to tasks via SVD subspace signatures computed in advance. The result is recovery of expert-level accuracy without storing or loading separate models, plus improved handling of tasks never seen during merging. The offset prediction also produces emergent interpolation that helps on out-of-distribution inputs.

Core claim

Parameter interference during merging is modeled as an affine transformation that reduces to an additive offset; a network trained to predict the offset from the merged weights alone can restore each original expert, while an offline SVD-based identifier selects the correct expert at inference without a learned router.

What carries the argument

ReTeX offset predictor that maps merged weights to task-specific additive corrections, paired with SVD subspace signatures for task identification.

If this is right

A single merged checkpoint supports near-expert accuracy on every training task without redundant storage.
The same checkpoint generalizes better to tasks absent from the original merge set.
Offset prediction produces automatic interpolation among seen experts when the input lies outside the training distribution.
Task identity at inference is obtained from projection residuals onto precomputed SVD subspaces rather than a separate router network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deployment cost for multi-task systems could drop because only one set of weights needs to be kept in memory.
The SVD identifier might transfer to other merging algorithms that do not explicitly train routers.
If offset prediction scales to very large models, it could serve as a lightweight post-merge adaptation step.
The emergent interpolation behavior suggests the method might support zero-shot composition of task knowledge.

Load-bearing premise

The changes each expert undergoes when merged can be reversed by learning a simple additive correction that depends only on the final merged weights.

What would settle it

Measure whether the offset predictor still recovers 95 percent performance when the merging procedure is changed to a method whose perturbations deviate from the assumed affine form.

Figures

Figures reproduced from arXiv: 2606.26902 by Jinwook Jung, Kumju Jo, Sungyong Baik, Taegyu Kim.

**Figure 2.** Figure 2: Overall ReTeX framework. (a) Task identification constructs a task memory bank by fitting a low-dimensional SVD subspace to merged-model representations for each task. Given a test representation z, ReTeX computes projection residuals {Rt(z)} T t=1 and outputs a task ID estimate tˆ by an argmin rule. (b) Task expert recovery conditions a task embedding on tˆ and predicts low-rank offsets across layers. A… view at source ↗

**Figure 3.** Figure 3: Performance and efficiency trade-offs of ReTeX. (a) Normalized accuracy across ranks r for three base merged models (WA, TA, TIES). (b) Normalized accuracy as a function of task embedding dimension demb for 8, 14, and 20 tasks with ViT-B/32 and fixed r=256. (c) Normalized accuracy vs. normalized memory for ReTeX and ReTeX-E at fixed outer rank r while varying rg. Normalized memory is computed as the storag… view at source ↗

read the original abstract

Multi-task model merging aims to consolidate several task-specific experts into a unified model, yet static merging consistently suffers from parameter interference. While dynamic merging models aim to bridge this gap, many works rely on the costly storage and loading of redundant expert components at inference. In this work, from the perspective of task expert, we view parameter interference as parameter perturbation introduced to each expert during merging process. We show that such parameter perturbations can be modeled as affine transformation, which can be approximated as additive offsets. Motivated by these, we propose Recover Task eXpert (ReTeX), a framework that predicts those offsets, in order to undo parameter interference and recover task-expert performance from a single merged checkpoint. To recover the appropriate expert when task identity is unknown, we introduce a router-free task identifier based on SVD subspace signatures computed offline before inference. At inference, the identifier selects the task whose subspace yields the smallest projection residual for a given input. As a result, ReTeX recovers over 95% of individual-expert performance in both vision and NLP domains, while significantly improving generalization to unseen tasks. Crucially, we also show that the parameter offset prediction leads to emergent adaptive interpolation of expert knowledge for out-of-distribution (OOD) tasks. ReTeX adaptively interpolates seen expert knowledge to handle unseen tasks. Our code is available at https://github.com/BAIKLAB/ReTeX

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ReTeX offers a distinct framing for recovering task experts via predicted additive offsets from one merged checkpoint plus an SVD subspace identifier, but the abstract states the affine-to-additive modeling without any validation or experimental details.

read the letter

ReTeX claims to recover over 95% of individual-expert performance from a single merged checkpoint by predicting additive parameter offsets, plus a router-free SVD subspace identifier for task selection. The offset prediction is also said to produce emergent adaptive interpolation for out-of-distribution tasks.

The new element is the combination of viewing merging interference as recoverable perturbations and using offline SVD signatures to pick the right offset without storing or loading multiple experts. This avoids the redundancy common in dynamic merging approaches.

The paper does a reasonable job motivating the storage problem and sketching how the offset network might enable better generalization to unseen tasks.

The soft spots are the modeling assumptions and missing evidence. The abstract states that perturbations can be modeled as affine transformations and approximated as additive offsets, yet gives no derivation or quantitative check on residual error after the best additive fit. If the actual differences contain scaling or layer-specific components, an additive predictor cannot fully undo them. No experimental details, baselines, or error bars appear in the abstract, so the 95% recovery numbers and OOD claims cannot be assessed.

This is aimed at researchers working on multi-task model merging and efficient deployment in vision and language models. Someone looking for single-checkpoint alternatives to storing experts might pick up useful ideas from the SVD identifier and offset framing.

It deserves peer review so the approximation quality and experimental results can be checked directly.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces ReTeX, a framework that models merging-induced parameter perturbations on task experts as affine transformations approximable by additive offsets, predicts these offsets from the merged checkpoint alone to recover individual expert performance, and employs an SVD-based subspace signature method for router-free task identification at inference. It reports recovering over 95% of per-expert performance on vision and NLP tasks while achieving improved generalization to unseen tasks via emergent adaptive interpolation of expert knowledge.

Significance. If the affine-to-additive modeling assumption is validated with quantitative bounds and the recovery results prove robust across baselines and tasks, the approach would offer a practical route to high-fidelity multi-task inference from a single checkpoint, reducing the storage overhead of dynamic merging methods while providing a mechanism for OOD task handling without retraining.

major comments (3)

[Abstract] Abstract: the central modeling claim that merging perturbations can be represented as affine maps and then approximated as additive offsets is asserted without derivation, without a quantitative bound on residual error after the best additive fit, and without empirical validation of approximation quality; this assumption is load-bearing for the recovery performance claim.
[Abstract] Abstract: the headline result of >95% recovery of individual-expert performance is stated without reference to the number of tasks, datasets, baselines, statistical error bars, or verification protocol, preventing assessment of whether the offset predictor genuinely undoes interference or merely learns a compensatory adapter.
[Abstract] Abstract / method description: the SVD subspace signature task identifier is introduced at a high level with no detail on subspace construction, dimensionality, or how projection residuals are computed and thresholded; because this component is required for the router-free claim, its correctness directly affects the reported inference results.

minor comments (2)

[Abstract] The abstract would benefit from a concise statement of the number of tasks, domains, and primary baselines used to support the 95% recovery figure.
Notation for the merged weights, expert weights, and predicted offsets should be introduced consistently and early to aid readability of the recovery equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on the abstract and method presentation. We will revise the manuscript to incorporate the requested derivations, quantitative details, experimental specifics, and method expansions while preserving the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: the central modeling claim that merging perturbations can be represented as affine maps and then approximated as additive offsets is asserted without derivation, without a quantitative bound on residual error after the best additive fit, and without empirical validation of approximation quality; this assumption is load-bearing for the recovery performance claim.

Authors: We agree the abstract presents the claim at a high level. We will add an explicit derivation of the affine transformation model from the merging process, a quantitative bound on the residual error after the additive fit, and empirical validation of approximation quality to the revised manuscript (new subsection in Section 3), with a concise reference included in the abstract. revision: yes
Referee: [Abstract] Abstract: the headline result of >95% recovery of individual-expert performance is stated without reference to the number of tasks, datasets, baselines, statistical error bars, or verification protocol, preventing assessment of whether the offset predictor genuinely undoes interference or merely learns a compensatory adapter.

Authors: We will revise the abstract to specify the number of tasks and datasets, the baselines used, statistical error bars from repeated runs, and the verification protocol (performance measured against individually fine-tuned experts on held-out data). This will clarify the nature of the recovery. revision: yes
Referee: [Abstract] Abstract / method description: the SVD subspace signature task identifier is introduced at a high level with no detail on subspace construction, dimensionality, or how projection residuals are computed and thresholded; because this component is required for the router-free claim, its correctness directly affects the reported inference results.

Authors: We will expand the method description to detail subspace construction (via SVD on task-specific parameter matrices), chosen dimensionality, projection residual computation (norm of the component orthogonal to the subspace), and thresholding procedure. These specifics will be added to Section 3 and referenced in the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper frames ReTeX as a predictive framework that learns to approximate merging-induced perturbations (modeled as affine maps and approximated by additive offsets) from the merged checkpoint alone, then uses an independent SVD-based task identifier at inference. No equations or steps reduce a claimed prediction to a fitted quantity by construction, nor does any load-bearing premise rest on self-citation chains or imported uniqueness results. The offset-prediction step is presented as a learned mapping whose success is evaluated against held-out expert performance rather than being definitionally equivalent to its inputs. The derivation therefore retains independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that merging interference takes the form of predictable additive offsets; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Parameter perturbations during merging can be modeled as affine transformations approximable as additive offsets
Directly stated as the motivation for offset prediction in the abstract.

invented entities (1)

recoverable parameter offsets no independent evidence
purpose: Undo interference to restore expert performance from merged checkpoint
Introduced as the target of the prediction network; no independent evidence outside the method itself.

pith-pipeline@v0.9.1-grok · 5791 in / 1198 out tokens · 23090 ms · 2026-06-26T04:58:57.080153+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

103 extracted references · 1 canonical work pages

[1]

arXiv preprint arXiv:2303.08774 (2023)

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

Pith/arXiv arXiv 2023
[2]

In: ICCCSP (2021)

Ahmed, M.I., Mamun, S.M., Asif, A.U.Z.: Dcnn-based vegetable image classifica- tion using transfer learning: A comparative study. In: ICCCSP (2021)

2021
[3]

Data in Brief (2023)

Ahmed, S.I., Ibrahim, M., Nadim, M., Rahman, M.M., Shejunti, M.M., Jabid, T., Ali, M.S.: Mangoleafbd: A comprehensive image dataset to classify diseased and healthy mango leaves. Data in Brief (2023)

2023
[4]

Alessio, C.: Animals-10.https : / / www . kaggle . com / datasets / alessiocorrado99/animals10
[5]

In: ICCV (2021)

Baik, S., Choi, J., Kim, H., Cho, D., Min, J., Lee, K.M.: Meta-learning with task-adaptive loss function for few-shot learning. In: ICCV (2021)

2021
[6]

In: NeurIPS (2020)

Baik, S., Choi, M., Choi, J., Kim, H., Lee, K.M.: Meta-learning with adaptive hyperparameters. In: NeurIPS (2020)

2020
[7]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

Baik, S., Choi, M., Choi, J., Kim, H., Lee, K.M.: Learning to learn task-adaptive hyperparameters for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

2024
[8]

In: CVPR (2020)

Baik, S., Hong, S., Lee, K.M.: Learning to forget for meta-learning. In: CVPR (2020)

2020
[9]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

Baik, S., Hong, S., Lee, K.M.: Learning to forget for meta-learning via task- and-layer-wise attenuation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

2022
[10]

Available on https://www

Bansal, P.: Intel image classification. Available on https://www. kaggle. com/puneet6060/intel-image-classification, Online (2019)

2019
[11]

In: WACV (2022)

Bateni, P., Barber, J., Van de Meent, J.W., Wood, F.: Enhancing few-shot image classification with unlabelled examples. In: WACV (2022)

2022
[12]

In: CVPR (2020)

Bateni, P., Goyal, R., Masrani, V., Wood, F., Sigal, L.: Improved few-shot visual classification. In: CVPR (2020)

2020
[13]

In: ECCV (2014)

Bossard, L., Guillaumin, M., Gool, L.V.: Food-101 – mining discriminative com- ponents with random forests. In: ECCV (2014)

2014
[14]

https://doi.org/10.34740/KAGGLE/DS/81794

CCHANG: Garbage classification.https://www.kaggle.com/ds/81794(2018). https://doi.org/10.34740/KAGGLE/DS/81794

work page doi:10.34740/kaggle/ds/81794 2018
[15]

Proceedings of the IEEE (2017)

Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE (2017)

2017
[16]

In: CVPR

Chi, Y., Porikli, F.: Connecting the dots in multi-class classification: From nearest subspace to collaborative representation. In: CVPR. pp. 3602–3609 (2012)

2012
[17]

IEEE Access (2022)

Choi, J., Baik, S., Choi, M., Kwon, J., Lee, K.M.: Visual tracking by adaptive continual meta-learning. IEEE Access (2022)

2022
[18]

In: CVPR (2020)

Choi, M., Choi, J., Baik, S., Kim, T.H., Lee, K.M.: Scene-adaptive video frame interpolation via meta-learning. In: CVPR (2020)

2020
[19]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

Choi, M., Choi, J., Baik, S., Kim, T.H., Lee, K.M.: Test-time adaptation for video frame interpolation via meta-learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

2022
[20]

In: CVPR (2020)

Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: Diverse image synthesis for multiple domains. In: CVPR (2020)

2020
[21]

In: CVPR (2014)

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)

2014
[22]

In: NeurIPS (2018)

Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical japanese literature. In: NeurIPS (2018)

2018
[23]

In: AISTATS (2011)

Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)

2011
[24]

In: IJCNN (2017)

Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: Emnist: Extending mnist to handwritten letters. In: IJCNN (2017)

2017
[25]

Journal of Diabetes Science and Technology (2009)

Cuadros, J., Bresnick, G.: Eyepacs: An adaptable telemedicine system for diabetic retinopathy screening. Journal of Diabetes Science and Technology (2009)

2009
[26]

cats, 2013

Cukierski, W.: Dogs vs. cats, 2013. URL https://kaggle.com/competitions/dogs- vs-cats

2013
[27]

In: ECCV (2024)

Davari, M., Belilovsky, E.: Model breadcrumbs: Scaling multi-task model merging with sparse masks. In: ECCV (2024)

2024
[28]

DeepNets: Landscape recognition.https : / / www . kaggle . com / datasets / utkarshsaxenadn/landscape-recognition-image-dataset-12k-images
[29]

IEEE signal processing magazine (2012)

Deng, L.: The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine (2012)

2012
[30]

Nature Machine Intelligence (2023)

Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C.M., Chen, W., et al.: Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence (2023)

2023
[31]

Computer Vision and Image Understanding (2007)

Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object cate- gories. Computer Vision and Image Understanding (2007)

2007
[32]

In: ICLR (2017)

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICLR (2017)

2017
[33]

In: CVPR (2025)

Gargiulo, A.A., Crisostomi, D., Bucarelli, M.S., Scardapane, S., Silvestri, F., Rodolà, E.: Task singular vectors: Reducing task interference in model merging. In: CVPR (2025)

2025
[34]

In: ICONIP (2013)

Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y.: Challenges ...

2013
[35]

Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. rep., California Institute of Technology (2007)

2007
[36]

In: ICLR (2017)

Ha, D., Dai, A.M., Le, Q.V.: Hypernetworks. In: ICLR (2017)

2017
[37]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019)

Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019)

2019
[38]

In: CVPR (2015)

Horn, G.V., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S.: Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: CVPR (2015)

2015
[39]

Howard, J.: Imagenette: A smaller subset of 10 easily classified classes from ima- genet.https://github.com/fastai/imagenette(2019)

2019
[40]

In: ICLR (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. In: ICLR (2022)

2022
[41]

In: NeurIPS (2024)

Huang, C., Ye, P., Chen, T., He, T., Yue, X., Ouyang, W.: Emr-merging: Tuning- free high-performance model merging. In: NeurIPS (2024)

2024
[42]

In: ICLR (2023)

Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: ICLR (2023)

2023
[43]

In: ICLR (2025)

Iurada, L., Ciccone, M., Tommasi, T.: Efficient model editing with task-localized sparse fine-tuning. In: ICLR (2025)

2025
[44]

In: ICLR (2023)

Jin, X., Ren, X., Preotiuc-Pietro, D., Cheng, P.: Dataless knowledge fusion by merging weights of language models. In: ICLR (2023)

2023
[45]

In: ICLR (2025)

Kang, J., Karlinsky, L., Luo, H., Wang, Z., Hansen, J., Glass, J., Cox, D., Panda, R., Feris, R., Ritter, A.: Self-moe: Towards compositional large language models with self-specialized experts. In: ICLR (2025)

2025
[46]

Cell (2018)

Kermany,D.S.,Goldbaum,M.,Cai,W.,Valentim,C.C.S.,Liang,H.,Baxter,S.L., McKeown, A., Yang, G., Wu, X., Yan, F., Dong, J., Prasadha, M., Pei, J., Ting, M., Zhu, J., Li, C., Hewett, S., Dong, J., Ziyar, I., Shi, A., Zhang, R., Gupta, K., Wong, R.M.Y., Lam, L.A.T.C., Cheung, J., Tsoi, M., Wu, V., Yan, C., Huang, C., Lee, D.H., Zhang, Y., Wong, J.W., Li, K.M., ...

2018
[47]

In: CVPR (2011)

Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine- grained image categorization. In: CVPR (2011)

2011
[48]

In: AAAI (2020)

Khot, T., Clark, P., Guerquin, M., Jansen, P., Sabharwal, A.: Qasc: A dataset for question answering via sentence composition. In: AAAI (2020)

2020
[49]

In: ICLR (2015)

Kingma, D.P.: Adam: A method for stochastic optimization. In: ICLR (2015)

2015
[50]

In: ICCV Workshop (2013)

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine- grained categorization. In: ICCV Workshop (2013)

2013
[51]

Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

2009
[52]

In: ECCV (2012)

Kumar, N., Belhumeur, P.N., Biswas, A., Jacobs, D.W., Kress, W.J., Lopez, I.C., Soares, J.V.B.: Leafsnap: A computer vision system for automatic plant species identification. In: ECCV (2012)

2012
[53]

com / AI - Lab - Makerere/ibean/

Lab, M.A.: Bean disease dataset (2020),https : / / github . com / AI - Lab - Makerere/ibean/

2020
[54]

In: ICLR (2025)

Lee, Y., Jung, J., Baik, S.: Mitigating parameter interference in model merging via sharpness-aware fine-tuning. In: ICLR (2025)

2025
[55]

In: KR (2012)

Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: KR (2012)

2012
[56]

In: ICLR (2024)

Li, P., Zhang, Z., Yadav, P., Sung, Y.L., Cheng, Y., Bansal, M., Chen, T.: Merge, then compress: Demystify efficient smoe with hints from its routing policy. In: ICLR (2024)

2024
[57]

In: CVPR (2017)

Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: CVPR (2017)

2017
[58]

In: CVPR (2022)

Li, W.H., Liu, X., Bilen, H.: Cross-domain few-shot learning with task-specific adapters. In: CVPR (2022)

2022
[59]

arXiv preprint arXiv:2206.11404 (2022)

Liao, P., Li, X., Liu, X., Keutzer, K.: The artbench dataset: Benchmarking gen- erative models with artworks. arXiv preprint arXiv:2206.11404 (2022)

arXiv 2022
[60]

In: NeurIPS (2024)

Lu, Z., Fan, C., Wei, W., Qu, X., Chen, D., Cheng, Y.: Twin-merging: Dynamic integration of modular expertise in model merging. In: NeurIPS (2024)

2024
[61]

arXiv preprint arXiv:1306.5151 (2013)

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

Pith/arXiv arXiv 2013
[62]

and Raffel, C.A.: Merging models with fisher-weighted aver- aging

Matena, Michael S. and Raffel, C.A.: Merging models with fisher-weighted aver- aging. In: NeurIPS (2022)

2022
[63]

Frontiers in Plant Science (2016)

Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Frontiers in Plant Science (2016)

2016
[64]

Transactions on Machine Learning Research (2023)

Muqeeth, M., Liu, H., Raffel, C.: Soft merging of experts with adaptive routing. Transactions on Machine Learning Research (2023)

2023
[65]

Acta Universitatis Sapientiae, Informatica (2018)

Muresan, H., Oltean, M.: Fruit recognition from images using deep learning. Acta Universitatis Sapientiae, Informatica (2018)

2018
[66]

Nagaraj, A.: Asl alphabet.https://www.kaggle.com/datasets/grassknoted/ asl-alphabet
[67]

In: NIPS Workshop (2011)

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y., et al.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop (2011)

2011
[68]

In: ICVGIP (2008)

Nilsback, M.E., Zisserman, A.: Automated flower classification over a large num- ber of classes. In: ICVGIP (2008)

2008
[69]

In: ICLR (2025)

Oh, C., Li, Y., Song, K., Yun, S., Han, D.: Dawin: Training-free dynamic weight interpolation for robust adaptation. In: ICLR (2025)

2025
[70]

Scientific Reports (2019)

Olsen, A., Konovalov, D.A., Philippa, B., Ridd, P., Wood, J.C., Johns, J., Banks, W.,Girgenti,B.,Kenny,O.,Whinney,J.,Calvert,B.,RahimiAzghadi,M.,White, R.D.: Deepweeds: A multiclass weed species image dataset for deep learning. Scientific Reports (2019)

2019
[71]

In: NeurIPS (2023)

Ortiz-Jimenez, G., Favero, A., Frossard, P.: Task arithmetic in the tangent space: Improved editing of pre-trained models. In: NeurIPS (2023)

2023
[72]

In: CVPR (2012)

Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: CVPR (2012)

2012
[73]

In: AAAI (2018)

Perez, E., Strub, F., De Vries, H., Dumoulin, V., Courville, A.: Film: Visual reasoning with a general conditioning layer. In: AAAI (2018)

2018
[74]

In: ACM MMSys (2017)

Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., Spampinato, C., Dang-Nguyen, D.T., Lux, M., Schmidt, P.T., et al.: Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: ACM MMSys (2017)

2017
[75]

In: CVPR (2009)

Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)

2009
[76]

In: ICML (2021)

Radford,A.,Kim,J.W.,Hallacy,C.,Ramesh,A.,Goh,G.,Agarwal,S.,Sastry,G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: ICML (2021)

2021
[77]

In: NeurIPS (2019)

Requeima, J., Gordon, J., Bronskill, J., Nowozin, S., Turner, R.E.: Fast and flexible multi-task classification using conditional neural adaptive processes. In: NeurIPS (2019)

2019
[78]

arXiv preprint arXiv:2404.18416 (2024)

Saab, K., Tu, T., Weng, W.H., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother,T.,Park,C.,Vedadi,E.,etal.:Capabilitiesofgeminimodelsinmedicine. arXiv preprint arXiv:2404.18416 (2024)

Pith/arXiv arXiv 2024
[79]

In: AAAI (2020)

Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: An adversarial winograd schema challenge at scale. In: AAAI (2020)

2020
[80]

In: ACL (2018)

Sharma, R., Allen, J., Bakhshandeh, O., Mostafazadeh, N.: Tackling the story ending biases in the story cloze test. In: ACL (2018)

2018

Showing first 80 references.

[1] [1]

arXiv preprint arXiv:2303.08774 (2023)

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

Pith/arXiv arXiv 2023

[2] [2]

In: ICCCSP (2021)

Ahmed, M.I., Mamun, S.M., Asif, A.U.Z.: Dcnn-based vegetable image classifica- tion using transfer learning: A comparative study. In: ICCCSP (2021)

2021

[3] [3]

Data in Brief (2023)

Ahmed, S.I., Ibrahim, M., Nadim, M., Rahman, M.M., Shejunti, M.M., Jabid, T., Ali, M.S.: Mangoleafbd: A comprehensive image dataset to classify diseased and healthy mango leaves. Data in Brief (2023)

2023

[4] [4]

Alessio, C.: Animals-10.https : / / www . kaggle . com / datasets / alessiocorrado99/animals10

[5] [5]

In: ICCV (2021)

Baik, S., Choi, J., Kim, H., Cho, D., Min, J., Lee, K.M.: Meta-learning with task-adaptive loss function for few-shot learning. In: ICCV (2021)

2021

[6] [6]

In: NeurIPS (2020)

Baik, S., Choi, M., Choi, J., Kim, H., Lee, K.M.: Meta-learning with adaptive hyperparameters. In: NeurIPS (2020)

2020

[7] [7]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

Baik, S., Choi, M., Choi, J., Kim, H., Lee, K.M.: Learning to learn task-adaptive hyperparameters for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

2024

[8] [8]

In: CVPR (2020)

Baik, S., Hong, S., Lee, K.M.: Learning to forget for meta-learning. In: CVPR (2020)

2020

[9] [9]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

Baik, S., Hong, S., Lee, K.M.: Learning to forget for meta-learning via task- and-layer-wise attenuation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

2022

[10] [10]

Available on https://www

Bansal, P.: Intel image classification. Available on https://www. kaggle. com/puneet6060/intel-image-classification, Online (2019)

2019

[11] [11]

In: WACV (2022)

Bateni, P., Barber, J., Van de Meent, J.W., Wood, F.: Enhancing few-shot image classification with unlabelled examples. In: WACV (2022)

2022

[12] [12]

In: CVPR (2020)

Bateni, P., Goyal, R., Masrani, V., Wood, F., Sigal, L.: Improved few-shot visual classification. In: CVPR (2020)

2020

[13] [13]

In: ECCV (2014)

Bossard, L., Guillaumin, M., Gool, L.V.: Food-101 – mining discriminative com- ponents with random forests. In: ECCV (2014)

2014

[14] [14]

https://doi.org/10.34740/KAGGLE/DS/81794

CCHANG: Garbage classification.https://www.kaggle.com/ds/81794(2018). https://doi.org/10.34740/KAGGLE/DS/81794

work page doi:10.34740/kaggle/ds/81794 2018

[15] [15]

Proceedings of the IEEE (2017)

Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE (2017)

2017

[16] [16]

In: CVPR

Chi, Y., Porikli, F.: Connecting the dots in multi-class classification: From nearest subspace to collaborative representation. In: CVPR. pp. 3602–3609 (2012)

2012

[17] [17]

IEEE Access (2022)

Choi, J., Baik, S., Choi, M., Kwon, J., Lee, K.M.: Visual tracking by adaptive continual meta-learning. IEEE Access (2022)

2022

[18] [18]

In: CVPR (2020)

Choi, M., Choi, J., Baik, S., Kim, T.H., Lee, K.M.: Scene-adaptive video frame interpolation via meta-learning. In: CVPR (2020)

2020

[19] [19]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

Choi, M., Choi, J., Baik, S., Kim, T.H., Lee, K.M.: Test-time adaptation for video frame interpolation via meta-learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

2022

[20] [20]

In: CVPR (2020)

Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: Diverse image synthesis for multiple domains. In: CVPR (2020)

2020

[21] [21]

In: CVPR (2014)

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)

2014

[22] [22]

In: NeurIPS (2018)

Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical japanese literature. In: NeurIPS (2018)

2018

[23] [23]

In: AISTATS (2011)

Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)

2011

[24] [24]

In: IJCNN (2017)

Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: Emnist: Extending mnist to handwritten letters. In: IJCNN (2017)

2017

[25] [25]

Journal of Diabetes Science and Technology (2009)

Cuadros, J., Bresnick, G.: Eyepacs: An adaptable telemedicine system for diabetic retinopathy screening. Journal of Diabetes Science and Technology (2009)

2009

[26] [26]

cats, 2013

Cukierski, W.: Dogs vs. cats, 2013. URL https://kaggle.com/competitions/dogs- vs-cats

2013

[27] [27]

In: ECCV (2024)

Davari, M., Belilovsky, E.: Model breadcrumbs: Scaling multi-task model merging with sparse masks. In: ECCV (2024)

2024

[28] [28]

DeepNets: Landscape recognition.https : / / www . kaggle . com / datasets / utkarshsaxenadn/landscape-recognition-image-dataset-12k-images

[29] [29]

IEEE signal processing magazine (2012)

Deng, L.: The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine (2012)

2012

[30] [30]

Nature Machine Intelligence (2023)

Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C.M., Chen, W., et al.: Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence (2023)

2023

[31] [31]

Computer Vision and Image Understanding (2007)

Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object cate- gories. Computer Vision and Image Understanding (2007)

2007

[32] [32]

In: ICLR (2017)

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICLR (2017)

2017

[33] [33]

In: CVPR (2025)

Gargiulo, A.A., Crisostomi, D., Bucarelli, M.S., Scardapane, S., Silvestri, F., Rodolà, E.: Task singular vectors: Reducing task interference in model merging. In: CVPR (2025)

2025

[34] [34]

In: ICONIP (2013)

Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y.: Challenges ...

2013

[35] [35]

Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. rep., California Institute of Technology (2007)

2007

[36] [36]

In: ICLR (2017)

Ha, D., Dai, A.M., Le, Q.V.: Hypernetworks. In: ICLR (2017)

2017

[37] [37]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019)

Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019)

2019

[38] [38]

In: CVPR (2015)

Horn, G.V., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S.: Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: CVPR (2015)

2015

[39] [39]

Howard, J.: Imagenette: A smaller subset of 10 easily classified classes from ima- genet.https://github.com/fastai/imagenette(2019)

2019

[40] [40]

In: ICLR (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. In: ICLR (2022)

2022

[41] [41]

In: NeurIPS (2024)

Huang, C., Ye, P., Chen, T., He, T., Yue, X., Ouyang, W.: Emr-merging: Tuning- free high-performance model merging. In: NeurIPS (2024)

2024

[42] [42]

In: ICLR (2023)

Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: ICLR (2023)

2023

[43] [43]

In: ICLR (2025)

Iurada, L., Ciccone, M., Tommasi, T.: Efficient model editing with task-localized sparse fine-tuning. In: ICLR (2025)

2025

[44] [44]

In: ICLR (2023)

Jin, X., Ren, X., Preotiuc-Pietro, D., Cheng, P.: Dataless knowledge fusion by merging weights of language models. In: ICLR (2023)

2023

[45] [45]

In: ICLR (2025)

Kang, J., Karlinsky, L., Luo, H., Wang, Z., Hansen, J., Glass, J., Cox, D., Panda, R., Feris, R., Ritter, A.: Self-moe: Towards compositional large language models with self-specialized experts. In: ICLR (2025)

2025

[46] [46]

Cell (2018)

Kermany,D.S.,Goldbaum,M.,Cai,W.,Valentim,C.C.S.,Liang,H.,Baxter,S.L., McKeown, A., Yang, G., Wu, X., Yan, F., Dong, J., Prasadha, M., Pei, J., Ting, M., Zhu, J., Li, C., Hewett, S., Dong, J., Ziyar, I., Shi, A., Zhang, R., Gupta, K., Wong, R.M.Y., Lam, L.A.T.C., Cheung, J., Tsoi, M., Wu, V., Yan, C., Huang, C., Lee, D.H., Zhang, Y., Wong, J.W., Li, K.M., ...

2018

[47] [47]

In: CVPR (2011)

Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine- grained image categorization. In: CVPR (2011)

2011

[48] [48]

In: AAAI (2020)

Khot, T., Clark, P., Guerquin, M., Jansen, P., Sabharwal, A.: Qasc: A dataset for question answering via sentence composition. In: AAAI (2020)

2020

[49] [49]

In: ICLR (2015)

Kingma, D.P.: Adam: A method for stochastic optimization. In: ICLR (2015)

2015

[50] [50]

In: ICCV Workshop (2013)

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine- grained categorization. In: ICCV Workshop (2013)

2013

[51] [51]

Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

2009

[52] [52]

In: ECCV (2012)

Kumar, N., Belhumeur, P.N., Biswas, A., Jacobs, D.W., Kress, W.J., Lopez, I.C., Soares, J.V.B.: Leafsnap: A computer vision system for automatic plant species identification. In: ECCV (2012)

2012

[53] [53]

com / AI - Lab - Makerere/ibean/

Lab, M.A.: Bean disease dataset (2020),https : / / github . com / AI - Lab - Makerere/ibean/

2020

[54] [54]

In: ICLR (2025)

Lee, Y., Jung, J., Baik, S.: Mitigating parameter interference in model merging via sharpness-aware fine-tuning. In: ICLR (2025)

2025

[55] [55]

In: KR (2012)

Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: KR (2012)

2012

[56] [56]

In: ICLR (2024)

Li, P., Zhang, Z., Yadav, P., Sung, Y.L., Cheng, Y., Bansal, M., Chen, T.: Merge, then compress: Demystify efficient smoe with hints from its routing policy. In: ICLR (2024)

2024

[57] [57]

In: CVPR (2017)

Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: CVPR (2017)

2017

[58] [58]

In: CVPR (2022)

Li, W.H., Liu, X., Bilen, H.: Cross-domain few-shot learning with task-specific adapters. In: CVPR (2022)

2022

[59] [59]

arXiv preprint arXiv:2206.11404 (2022)

Liao, P., Li, X., Liu, X., Keutzer, K.: The artbench dataset: Benchmarking gen- erative models with artworks. arXiv preprint arXiv:2206.11404 (2022)

arXiv 2022

[60] [60]

In: NeurIPS (2024)

Lu, Z., Fan, C., Wei, W., Qu, X., Chen, D., Cheng, Y.: Twin-merging: Dynamic integration of modular expertise in model merging. In: NeurIPS (2024)

2024

[61] [61]

arXiv preprint arXiv:1306.5151 (2013)

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

Pith/arXiv arXiv 2013

[62] [62]

and Raffel, C.A.: Merging models with fisher-weighted aver- aging

Matena, Michael S. and Raffel, C.A.: Merging models with fisher-weighted aver- aging. In: NeurIPS (2022)

2022

[63] [63]

Frontiers in Plant Science (2016)

Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Frontiers in Plant Science (2016)

2016

[64] [64]

Transactions on Machine Learning Research (2023)

Muqeeth, M., Liu, H., Raffel, C.: Soft merging of experts with adaptive routing. Transactions on Machine Learning Research (2023)

2023

[65] [65]

Acta Universitatis Sapientiae, Informatica (2018)

Muresan, H., Oltean, M.: Fruit recognition from images using deep learning. Acta Universitatis Sapientiae, Informatica (2018)

2018

[66] [66]

Nagaraj, A.: Asl alphabet.https://www.kaggle.com/datasets/grassknoted/ asl-alphabet

[67] [67]

In: NIPS Workshop (2011)

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y., et al.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop (2011)

2011

[68] [68]

In: ICVGIP (2008)

Nilsback, M.E., Zisserman, A.: Automated flower classification over a large num- ber of classes. In: ICVGIP (2008)

2008

[69] [69]

In: ICLR (2025)

Oh, C., Li, Y., Song, K., Yun, S., Han, D.: Dawin: Training-free dynamic weight interpolation for robust adaptation. In: ICLR (2025)

2025

[70] [70]

Scientific Reports (2019)

Olsen, A., Konovalov, D.A., Philippa, B., Ridd, P., Wood, J.C., Johns, J., Banks, W.,Girgenti,B.,Kenny,O.,Whinney,J.,Calvert,B.,RahimiAzghadi,M.,White, R.D.: Deepweeds: A multiclass weed species image dataset for deep learning. Scientific Reports (2019)

2019

[71] [71]

In: NeurIPS (2023)

Ortiz-Jimenez, G., Favero, A., Frossard, P.: Task arithmetic in the tangent space: Improved editing of pre-trained models. In: NeurIPS (2023)

2023

[72] [72]

In: CVPR (2012)

Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: CVPR (2012)

2012

[73] [73]

In: AAAI (2018)

Perez, E., Strub, F., De Vries, H., Dumoulin, V., Courville, A.: Film: Visual reasoning with a general conditioning layer. In: AAAI (2018)

2018

[74] [74]

In: ACM MMSys (2017)

Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., Spampinato, C., Dang-Nguyen, D.T., Lux, M., Schmidt, P.T., et al.: Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: ACM MMSys (2017)

2017

[75] [75]

In: CVPR (2009)

Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)

2009

[76] [76]

In: ICML (2021)

Radford,A.,Kim,J.W.,Hallacy,C.,Ramesh,A.,Goh,G.,Agarwal,S.,Sastry,G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: ICML (2021)

2021

[77] [77]

In: NeurIPS (2019)

Requeima, J., Gordon, J., Bronskill, J., Nowozin, S., Turner, R.E.: Fast and flexible multi-task classification using conditional neural adaptive processes. In: NeurIPS (2019)

2019

[78] [78]

arXiv preprint arXiv:2404.18416 (2024)

Saab, K., Tu, T., Weng, W.H., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother,T.,Park,C.,Vedadi,E.,etal.:Capabilitiesofgeminimodelsinmedicine. arXiv preprint arXiv:2404.18416 (2024)

Pith/arXiv arXiv 2024

[79] [79]

In: AAAI (2020)

Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: An adversarial winograd schema challenge at scale. In: AAAI (2020)

2020

[80] [80]

In: ACL (2018)

Sharma, R., Allen, J., Bakhshandeh, O., Mostafazadeh, N.: Tackling the story ending biases in the story cloze test. In: ACL (2018)

2018