pith. sign in

arxiv: 2606.11262 · v1 · pith:CZKJYXA4new · submitted 2026-06-09 · 💻 cs.LG · cs.AI

PermDoRA -- Understanding Adapter Interference in Language Models: Limits of Parameter-Space Geometry

Pith reviewed 2026-06-27 14:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords adapter interferenceparameter-space geometryDoRALoRAmulti-domain adaptationnonlinear representationsRiemannian mergingFrechet mean
0
0 comments X

The pith

Adapter interference in language models arises from shared nonlinear representations rather than overlaps in parameter-space geometry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether interference when composing adapters for multiple domains stems mainly from overlap in linear parameter updates, so that enforcing orthogonality or directional independence should improve results. Using the DoRA-RBAC framework on LLaMA-3.1-8B and Mistral-7B, the authors compare ordinary Euclidean averaging against a geometry-aware strategy that approximates the Frechet mean through normalized directional averaging. Across GPQA, PubMedQA, SimpleQA and WMDP, the geometry-aware method shows no consistent gain over standard averaging, and measures of angular alignment turn out to be weak predictors of how well adapters compose. A sympathetic reader would care because the result challenges the assumption that keeping adapter updates geometrically independent is the right way to avoid cross-domain interference.

Core claim

While single-domain performance matches LoRA, geometry-aware merging provides no consistent advantage over standard averaging in multi-domain settings. Angular alignment and orthogonality of adapter updates are weak predictors of composition performance. These findings suggest that adapter interference is not governed primarily by parameter-space geometry, but is instead consistent with interactions in shared nonlinear representations.

What carries the argument

DoRA-RBAC, a hierarchical adapter composition framework based on weight-decomposed low-rank adaptation, used to compare Euclidean merging with normalized directional averaging that approximates the Frechet mean.

If this is right

  • Single-domain adapter performance matches that of standard LoRA.
  • Geometry-aware merging provides no consistent advantage over standard averaging in multi-domain settings.
  • Angular alignment and orthogonality of adapter updates are weak predictors of composition performance.
  • Adapter interference is consistent with interactions in shared nonlinear representations rather than parameter-space overlaps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future adapter work may need to target representation-level interactions rather than parameter geometry to reduce interference.
  • The same pattern could appear in other task types or larger models not tested here.
  • Alternative merging strategies that act directly on activations might be worth testing next.

Load-bearing premise

The tested geometry-aware merging strategy and the chosen QA benchmarks are sufficient to detect whether parameter-space geometry controls interference in general multi-domain settings.

What would settle it

A geometry-aware merging method that consistently outperforms standard averaging on the same or similar multi-domain QA benchmarks would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.11262 by Gowtham Sivaramakrishnan, Kishan Gupta Balaji, Santhosh Baradwaj Vaduvur Ranganathan, Sarvesha Kumar Kombaiah Seetha.

Figure 1
Figure 1. Figure 1: SimpleQA accuracy for all 2-domain (left) and 3-domain (right) combinations on [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cosine similarity between domain adapter updates versus the performance gap [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PermDoRA pipeline. Per-domain DoRA adapters are trained with EWC regulari [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-domain UGI for Llama-3.1-8B under single-domain adaptation. DoRA and [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: KL divergence between the output distributions of a single composed adapter [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Left: Per-domain SimpleQA accuracy with and without the domain-specific manager adapter (Llama-3.1-8B, SimpleQA domains). The manager improves average accuracy by 0.1178 UGI. Right: DDI AUC (Min-K%++ variant) for each domain. All values exceed the chance level of 0.5 (dashed red line), confirming that domain membership remains inferrable despite adapter modularisation. a+g a+o a+m a+h g+m g+h g+o m+o h+m h… view at source ↗
Figure 7
Figure 7. Figure 7: Mean cosine similarity between pairs of domain adapter effective weight updates [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Pairwise LoRA cosine similarity between domain adapters for Llama-3.1-8B [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
read the original abstract

Access control in large language models (LLMs) requires modular mechanisms to enable domain-specific behavior without retraining or cross-domain interference. A common hypothesis is that interference during adapter composition arises from overlap in linear parameter updates, suggesting that enforcing orthogonality or directional independence should improve multi-domain performance. We test this hypothesis using DoRA-RBAC, a hierarchical adapter composition framework based on weight-decomposed low-rank adaptation. We compare conventional Euclidean merging with a geometry-aware Riemannian-inspired merging strategy that approximates the Frechet mean via normalized directional averaging across multiple QA benchmarks (GPQA, PubMedQA, SimpleQA, WMDP) on LLaMA-3.1-8B and Mistral-7B. Our results show that while single-domain performance matches LoRA, geometry-aware merging provides no consistent advantage over standard averaging in multi-domain settings.Diagnostic analysis further reveals that angular alignment and orthogonality of adapter updates are weak predictors of composition performance. These findings suggest that adapter interference is not governed primarily by parameter-space geometry, but is instead consistent with interactions in shared nonlinear representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper claims that adapter interference during composition in LLMs is not governed primarily by overlaps in parameter-space geometry. Using the DoRA-RBAC framework on LLaMA-3.1-8B and Mistral-7B, it compares standard Euclidean merging against a geometry-aware strategy (normalized directional averaging as a proxy for the Fréchet mean) across QA benchmarks (GPQA, PubMedQA, SimpleQA, WMDP). Single-domain performance matches LoRA, but the geometry-aware method shows no consistent multi-domain gains, and angular/orthogonality metrics are weak predictors of performance; the authors conclude that interference instead arises from interactions in shared nonlinear representations.

Significance. If the central empirical result holds after addressing the noted gaps, the work would be significant for the modular-adapter literature by providing evidence against a geometry-centric view of interference and motivating investigation of nonlinear representation interactions. The study receives credit for its direct comparison of merging strategies on two model families and a multi-benchmark QA suite, which supplies a concrete falsification attempt of the parameter-space hypothesis.

major comments (3)
  1. [Abstract] Abstract: the inference that 'adapter interference is not governed primarily by parameter-space geometry' is drawn from a null result on one specific merging approximation (normalized directional averaging). The manuscript does not establish that this operation is a faithful test of the geometry hypothesis (e.g., by comparing it to other Riemannian operations or proving it would detect geometry-driven interference if present), rendering the leap to the nonlinear-representation alternative under-supported.
  2. [Abstract] Abstract: the benchmarks are limited to four QA tasks (GPQA, PubMedQA, SimpleQA, WMDP). Without evidence or additional experiments showing these tasks are representative of general multi-domain composition, the null result cannot be taken to imply that geometry effects are absent across broader task distributions.
  3. [Abstract] Abstract: the comparative results are reported without details on statistical tests, run-to-run variance, or the precise implementation of the Riemannian approximation. These omissions make it impossible to assess whether the observed lack of advantage is robust or merely under-powered.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, proposing targeted revisions to strengthen the manuscript while maintaining the integrity of our empirical findings.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the inference that 'adapter interference is not governed primarily by parameter-space geometry' is drawn from a null result on one specific merging approximation (normalized directional averaging). The manuscript does not establish that this operation is a faithful test of the geometry hypothesis (e.g., by comparing it to other Riemannian operations or proving it would detect geometry-driven interference if present), rendering the leap to the nonlinear-representation alternative under-supported.

    Authors: We acknowledge that normalized directional averaging is one specific proxy for the Fréchet mean and that the manuscript does not compare it against alternative Riemannian operations. In revision we will temper the abstract and discussion to state that the results show no advantage from this geometry-aware strategy, thereby providing evidence against a purely parameter-space view without claiming to have exhaustively tested all possible geometry-based merging methods. A new limitations paragraph will discuss the choice of approximation. revision: partial

  2. Referee: [Abstract] Abstract: the benchmarks are limited to four QA tasks (GPQA, PubMedQA, SimpleQA, WMDP). Without evidence or additional experiments showing these tasks are representative of general multi-domain composition, the null result cannot be taken to imply that geometry effects are absent across broader task distributions.

    Authors: The four QA benchmarks were selected to span distinct domains (general knowledge, biomedical, safety). We will add explicit justification for this selection in the methods and a limitations statement noting that generalization to other task families (e.g., long-form generation or multi-step reasoning) remains untested. revision: partial

  3. Referee: [Abstract] Abstract: the comparative results are reported without details on statistical tests, run-to-run variance, or the precise implementation of the Riemannian approximation. These omissions make it impossible to assess whether the observed lack of advantage is robust or merely under-powered.

    Authors: We will expand the methods and results sections to report: the exact formula for normalized directional averaging, standard deviations across multiple random seeds, and statistical comparisons (paired t-tests) between merging strategies. These additions will allow readers to evaluate the robustness of the null result. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical null-result study with no derivations or self-referential steps

full rationale

The paper reports direct experimental comparisons of Euclidean vs. Riemannian-inspired merging on fixed QA benchmarks using LLaMA-3.1-8B and Mistral-7B. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or ansatzes appear in the provided text. The central claim follows from measured performance differences and correlation diagnostics rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical study with no mathematical derivations or postulated entities described in the abstract.

pith-pipeline@v0.9.1-grok · 5745 in / 1032 out tokens · 19407 ms · 2026-06-27T14:20:13.636848+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 4 canonical work pages

  1. [1]

    AdapterSwap : Continuous training of LLMs with data removal and access-control guarantees

    William Fleshman, Aleem Khan, Marc Marone, and Benjamin Van Durme . AdapterSwap : Continuous training of LLMs with data removal and access-control guarantees. arXiv preprint arXiv:2404.08417, 2025. URL https://arxiv.org/abs/2404.08417

  2. [2]

    The llama 3 herd of models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. URL https://arxiv.org/abs/2407.21783

  3. [3]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9

  4. [4]

    Marathe, Hamid Mozaffari, William F

    Bargav Jayaraman, Virendra J. Marathe, Hamid Mozaffari, William F. Shen, and Krishnaram Kenthapadi. Permissioned LLMs : Enforcing access control in large language models. arXiv preprint arXiv:2505.22860, 2025. URL https://arxiv.org/abs/2505.22860

  5. [5]

    Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L\' e lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth\' e e Lacroix, and William El Sayed. Mistral 7b. arXiv preprint...

  6. [6]

    URLhttps://doi.org/10.18653/v1/D19-1259

    Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. PubMedQA : A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.\ 2567--2577. Association for Computa...

  7. [7]

    Communications on Pure and Applied Mathematics , volume =

    Hermann Karcher. Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Mathematics, 30 0 (5): 0 509--541, 1977. doi:10.1002/cpa.3160300502

  8. [8]

    Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew Bo Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Ariel Herbert-Voss, Cort B. Breuer, Andy Z...

  9. [9]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.\ 4582--4597. Association for Computational Linguistics, 2021. doi:10.1...

  10. [10]

    DoRA : Weight-decomposed low-rank adaptation

    Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. DoRA : Weight-decomposed low-rank adaptation. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp.\ 32100--32121. PMLR, 2024. URL https://proceedings.mlr.press...

  11. [11]

    David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. GPQA : A graduate-level google-proof Q&A benchmark. In First Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=Ti67584b98

  12. [12]

    Shoemake

    Ken Shoemake. Animating rotation with quaternion curves. In Proceedings of the 12th Annual Conference on Computer Graphics and Interactive Techniques ( SIGGRAPH '85) , pp.\ 245--254. ACM, 1985. doi:10.1145/325334.325242

  13. [13]

    Measuring short-form factuality in large language models

    Jason Wei, Karina Nguyen, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, and William Fedus. Measuring short-form factuality in large language models. arXiv preprint arXiv:2411.04368, 2024. URL https://arxiv.org/abs/2411.04368

  14. [14]

    Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt

    Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Proceedings of the 39th International Conferen...

  15. [15]

    TIES -merging: Resolving interference when merging models

    Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. TIES -merging: Resolving interference when merging models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=xtaX3WyCj1

  16. [16]

    Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, and Huishuai Zhang

    Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, and Huishuai Zhang. Differentially private fine-tuning of language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Q42f0dfjECO

  17. [17]

    Min- K \ In The Thirteenth International Conference on Learning Representations, 2025

    Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Frank Yang, and Hai Li. Min- K \ In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=ZGkfoufDaU