arxiv: 2604.21571 · v1 · submitted 2026-04-23 · 💻 cs.AI · cs.LG

Recognition: unknown

Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

Ben Bariach, Chris Schneider, Philipp Schoenegger

Pith reviewed 2026-05-09 22:23 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords privacy-preserving LLMmachine unlearningLoRA adaptersuser proxiespersonalizationcomposable adaptersdifferential privacymodel deletion

0 comments

The pith

A three-layer architecture keeps user data out of shared LLM weights so deleting a proxy artefact fully unlearns the user by simple removal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a separable expert architecture that splits an LLM into a static base model, composable domain-expert LoRA adapters, and per-user proxy artefacts. Personal information is confined to the proxies, which can be deleted to achieve deterministic unlearning without touching shared weights or adapters. This design is shown to produce user-specific outputs that revert to baseline behavior after proxy removal, with measured KL divergence around 0.21 nats and 82-89 percent verification rates. Because user data never enters the shared components, the approach blocks model inversion and membership inference attacks against those components by construction. The method also remains compatible with differentially private training on the shared parts.

Core claim

The architecture decouples personal data from shared weights by routing user-specific behavior exclusively through deletable proxy artefacts while domain expertise lives in composable LoRA adapters. Evaluation on Phi-3.5-mini and Llama-3.1-8B shows that personal data influences outputs during use yet produces near-baseline behavior after proxy deletion, with negligible cross-user contamination. This structure converts machine unlearning from an intractable weight-editing task into a deterministic deletion operation that preserves personalization and privacy guarantees.

What carries the argument

The three-layer separable expert architecture: a static base model, composable domain-expert LoRA adapters that shape behavior without carrying user data, and per-user proxy artefacts whose deletion performs unlearning.

If this is right

User-specific personalization occurs without any user data entering shared model weights or adapters.
Machine unlearning reduces to deterministic deletion of the proxy artefact rather than retraining or weight editing.
Privacy attacks such as model inversion and membership inference are blocked against shared components by construction.
The design remains compatible with differentially private stochastic gradient descent on the shared base and adapters.
Near-zero cross-user contamination is observed alongside high verification rates that the original user data is no longer present after deletion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same proxy mechanism could support selective forgetting of individual facts within a user's data rather than entire user profiles.
Composable adapters might allow domain experts to be shared across users while still keeping each user's personal data isolated.
The architecture could be extended to multimodal models where image or audio user data is likewise confined to deletable proxies.
Verification rates and KL divergence after deletion provide a practical test that could be applied to other unlearning methods for comparison.

Load-bearing premise

That the per-user proxy artefacts can be kept fully isolated from the base model and adapters so that their deletion leaves no residual influence or leakage.

What would settle it

A test that measures whether, after proxy deletion, the model still produces outputs that encode the deleted user's specific facts or preferences at rates significantly above the no-proxy baseline.

Figures

Figures reproduced from arXiv: 2604.21571 by Ben Bariach, Chris Schneider, Philipp Schoenegger.

**Figure 1.** Figure 1: Separable Expert Architecture. Shared components (left) contain no user-specific information: a frozen base [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Distribution of unpersonalized-to-baseline KL-divergence scores across all prompt-user combinations for both base [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-layer architecture that decouples personal data from shared weights by combining a static base model, composable domain-expert LoRA adapters that shape behavior without imparting user data, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B confirms per-user differentiation in which personal data influences outputs while remaining isolated, verified by a return to baseline after proxy removal (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) and near-zero cross-user contamination. Because user-specific information never enters shared weights, the architecture mitigates model inversion, membership inference, and training-data extraction against shared model components by construction. The approach converts machine unlearning from an intractable weight-editing problem into a deterministic deletion operation that preserves personalization alongside privacy-enhancing guarantees and is compatible with differentially private stochastic gradient descent (DP-SGD) for privacy-preserving shared model improvement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's three-layer split keeps user data out of shared weights via deletable proxies, which is a practical way to handle unlearning but rests on thin experimental evidence.

read the letter

The core idea here is straightforward: a static base model plus domain LoRA adapters that avoid user data, with per-user proxies that hold the personal bits and can just be deleted for unlearning. This turns a hard weight-editing task into a simple removal step while claiming isolation by construction against attacks on the shared parts. The abstract reports that on Phi-3.5-mini and Llama-3.1-8B the setup produces differentiated outputs that revert close to baseline after proxy deletion, with KL around 0.21 nats and 82-89% verification, plus almost no cross-user leakage. That lines up with the architectural claim and shows the separation can work in practice at least for the tested cases. Compatibility with DP-SGD for the shared layers is also a sensible touch for real deployments. What stands out is how cleanly the design avoids putting personal information into the weights that everyone shares. The soft spots are in the evaluation. The numbers are given without detailed baselines, full method descriptions, or comparisons to other unlearning approaches, so it's not clear how much the proxies add beyond standard adapters or whether edge cases like proxy composition or inference attacks are fully covered. The return-to-baseline metric is consistent but may not catch every possible leakage path. This is aimed at people building compliant personalized LLM services where data deletion requests are common. A reader working on privacy engineering or adapter-based systems would get a usable idea to test, even if the current results are preliminary. It deserves peer review because the problem is real and the separation is simple enough to implement and check further.

Referee Report

2 major / 1 minor

Summary. The paper proposes a three-layer Separable Expert Architecture for privacy-preserving LLM personalization: a static base model, composable domain-expert LoRA adapters that shape behavior without incorporating user data, and per-user proxy artefacts. Deletion of the proxies constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B reports per-user differentiation, return to baseline after proxy removal (KL divergence ~0.21 nats, 82-89% verification pass rate), near-zero cross-user contamination, and mitigation of model inversion, membership inference, and training-data extraction attacks against shared components by construction. The approach is compatible with DP-SGD.

Significance. If the claimed isolation holds, this architecture offers a practical advance in machine unlearning for LLMs by reducing it to a deterministic deletion operation while retaining personalization. The separation of user data from shared weights provides privacy guarantees by design and could influence future personalized AI systems. The compatibility with DP-SGD and use of LoRA for domain expertise without user data injection are notable strengths.

major comments (2)

[Evaluation] Evaluation section: The reported return to baseline (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) after proxy removal supports the unlearning claim, but the section lacks baselines from prior unlearning methods, ablation studies on proxy construction, and full controls for residual influence or leakage, which are load-bearing for verifying complete isolation of per-user artefacts.
[Architecture] Architecture description: The central 'by construction' mitigation relies on user-specific information never entering shared weights via the three-layer separation, but there is no formal specification, equation, or pseudocode defining the proxy artefacts' integration and isolation from the base model and adapters.

minor comments (1)

[Abstract] Abstract: The claim of 'near-zero cross-user contamination' is stated without specifying the exact metric, threshold, or measurement method used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the constructive feedback. We address each major comment below, proposing targeted revisions to strengthen the manuscript while preserving its core contributions.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The reported return to baseline (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) after proxy removal supports the unlearning claim, but the section lacks baselines from prior unlearning methods, ablation studies on proxy construction, and full controls for residual influence or leakage, which are load-bearing for verifying complete isolation of per-user artefacts.

Authors: We agree that additional context would strengthen the claims. In revision, we will add a dedicated paragraph comparing our deterministic deletion to approximate unlearning baselines (e.g., gradient-ascent or scrubbing methods), emphasizing that our approach achieves exact removal at negligible cost without retraining. We will also include ablation results on proxy construction (varying dimension and layer count) and expand the controls with statistical tests on output distributions across prompt sets to further verify isolation. These additions can be incorporated without new large-scale experiments. revision: partial
Referee: [Architecture] Architecture description: The central 'by construction' mitigation relies on user-specific information never entering shared weights via the three-layer separation, but there is no formal specification, equation, or pseudocode defining the proxy artefacts' integration and isolation from the base model and adapters.

Authors: We accept this point and will revise the architecture section to include formal specification. We will add pseudocode for the forward pass (output = base(x) + sum LoRA adapters(x) + proxy_u(x)) and an equation formalizing isolation: user data influences only the separately stored proxy parameters, whose deletion removes all traces without touching shared weights. This will explicitly substantiate the privacy guarantees by construction. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central guarantee follows directly from architectural separation

full rationale

The paper's core claim—that user data never enters shared weights and deletion of proxies constitutes deterministic unlearning—follows directly from the explicit three-layer design (static base model + composable domain-expert LoRA adapters + deletable per-user proxies) rather than from any fitted parameters, self-referential predictions, or load-bearing self-citations. The abstract and description present this as a definitional property of the separation, with KL divergence and verification rates offered only as empirical confirmation of isolation, not as the derivation itself. No equations reduce to inputs by construction, no uniqueness theorems are imported from prior author work, and no ansatz or renaming of known results is smuggled in. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The paper introduces per-user proxy artefacts as a new component whose isolation properties are central to the claims; no free parameters or standard axioms are detailed in the abstract.

invented entities (1)

per-user proxy artefacts no independent evidence
purpose: Store user-specific data separately so that deletion performs deterministic unlearning without affecting shared weights
New postulated component required for the separation and unlearning guarantee; no independent evidence outside the architecture itself is provided in the abstract.

pith-pipeline@v0.9.0 · 5503 in / 1107 out tokens · 31858 ms · 2026-05-09T22:23:59.163876+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 27 canonical work pages · 6 internal anchors

[1]

LaMP: When large language models meet personalization.arXiv preprint arXiv:2304.11406, 2023

Alireza Salemi, Sheshera Mysore, Michael Bender- sky, and Hamed Zamani. Lamp: When large lan- guage models meet personalization. InProceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024. URL https://arxiv.org/abs/2304.11406

work page arXiv 2024
[2]

Personalized soups: Per- sonalized large language model alignment via post-hoc parameter merging.arXiv preprint arXiv:2310.11564, 2023

Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, and Prithviraj Ammanabrolu. Personalized soups: Personalized large language model alignment via post-hoc pa- rameter merging. InAdvances in Neural Infor- mation Processing Systems, 2023. URL https: //arxiv.org/abs/2310.11564

work page arXiv 2023
[3]

arXiv preprint arXiv:2402.05133 , year=

Xinyu Li, Ruiyang Zhou, Zachary C. Lipton, and Leqi Liu. Personalized language modeling from personalized human feedback.arXiv preprint arXiv:2402.05133, 2024. URL https://arxiv. org/abs/2402.05133

work page arXiv 2024
[4]

Personalizing reinforcement learning from human feedback with variational preference learning

Sriyash Poddar, Yanming Wan, Hamish Ivison, Ab- hishek Gupta, and Natasha Jaques. Personalizing reinforcement learning from human feedback with variational preference learning. InAdvances in Neu- ral Information Processing Systems 37 (NeurIPS 2024), 2024. URL https://arxiv.org/abs/2408. 10075

2024
[5]

Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot

Lucas Bourtoule, Varun Chandrasekaran, Christo- pher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Pa- pernot. Machine unlearning. In2021 IEEE Sym- posium on Security and Privacy (SP), 2021. URL https://arxiv.org/abs/1912.03817

work page arXiv 2021
[6]

Eternal sunshine of the spotless net: Se- lective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Se- lective forgetting in deep networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9304–9312,
[7]

URL https://arxiv.org/abs/1911.04933

work page arXiv 1911
[8]

Kairan Zhao, Meghdad Kurmanji, George-Octavian Bărbulescu, Eleni Triantafillou, and Peter Triantafillou

Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catas- trophic collapse to effective unlearning. InConfer- 7 ence on Language Modeling (COLM 2024), 2024. URLhttps://arxiv.org/abs/2404.05868

work page arXiv 2024
[9]

org/abs/2403.03218

Nathaniel Li, Alexander Pan, Anjali Gopal, Sum- mer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xi- aoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ar...

work page arXiv 2024
[10]

Model inversion attacks that exploit confi- dence information and basic countermeasures

Matt Fredrikson, Somesh Jha, and Thomas Risten- part. Model inversion attacks that exploit confi- dence information and basic countermeasures. In Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security (CCS ’15), 2015. doi: 10.1145/2810103.2813677

work page doi:10.1145/2810103.2813677 2015
[11]

Extracting training data from large language models

Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Ex- tracting training data from large language models. In30th USENIX Security Symposium, 2021. URL https://arxiv.org/abs/2012.07805

work page arXiv 2021
[13]

URL https://arxiv.org/abs/2311.17035

work page arXiv
[14]

Membership Inference Attacks Against Machine Learning Models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. doi: 10.1109/SP.2017.41. URL https://arxiv.org/abs/1610.05820

work page doi:10.1109/sp.2017.41 2017
[15]

Making ai forget you: Data deletion in machine learning,

Antonio Ginart, Melody Y. Guan, Gregory Valiant, and James Zou. Making AI forget you: Data dele- tion in machine learning. InAdvances in Neural Information Processing Systems (NeurIPS), vol- ume 32, 2019. URL https://arxiv.org/abs/ 1907.05012

work page arXiv 2019
[16]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 35, pages 11516–11524, 2021. URLhttps: //arxiv.org/abs/2010.10981

work page arXiv 2021
[17]

Who’s harry potter? approximate unlearning in llms,

Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in LLMs. InInter- national Conference on Learning Representations (ICLR 2024), 2024. URL https://arxiv.org/ abs/2310.02238

work page arXiv 2024
[18]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations (ICLR 2022), 2022. URL https://arxiv.org/abs/2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InAdvances in Neural Informa- tion Processing Systems 36 (NeurIPS 2023), 2023. URLhttps://arxiv.org/abs/2305.14314

work page internal anchor Pith review arXiv 2023
[20]

Lorahub: Efficient cross-task generalization via dynamic lora composition,

Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, and Min Lin. LoraHub: Ef- ficient cross-task generalization via dynamic LoRA composition. InConference on Language Modeling (COLM 2024), 2024. URL https://arxiv.org/ abs/2307.13269

work page arXiv 2024
[21]

arXiv preprint arXiv:2306.14870 , year=

Jinghan Zhang, Shiqi Chen, Junteng Liu, and Junx- ian He. Composing parameter-efficient modules with arithmetic operations. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. URLhttps://arxiv.org/abs/2306.14870

work page arXiv 2023
[22]

Editing Models with Task Arithmetic

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Edit- ing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022. URL https://arxiv. org/abs/2212.04089

work page internal anchor Pith review arXiv 2022
[23]

S-lora: Serving thousands of concurrent lora adapters,

Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, and Ion Stoica. S-LoRA: Serv- ing thousands of concurrent LoRA adapters. In Proceedings of Machine Learning and Systems 6 (MLSys 2024), 2024. URL https://arxiv.org/ abs/2311.03285

work page arXiv 2024
[24]

20 StevenChiang, YiwenLu, QihanLiu, AndrewChen, PonyMa, andMindLab

Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, and Arvind Krishnamurthy. Punica: Multi-tenant LoRA serving. InProceedings of Ma- chine Learning and Systems 6 (MLSys 2024), 2024. URLhttps://arxiv.org/abs/2310.18547

work page arXiv 2024
[25]

Steering Llama 2 via contrastive activation addition

Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. Steering Llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL),
[26]

URL https://arxiv.org/abs/2312.06681. 8

work page internal anchor Pith review arXiv
[27]

Inference-time intervention: Eliciting truthful an- swers from a language model

Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Inference-time intervention: Eliciting truthful an- swers from a language model. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023),

2023
[28]

URL https://arxiv.org/abs/2306.03341

work page arXiv
[29]

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , pages =

Martín Abadi, Andy Chu, Ian Goodfellow, H. Bren- dan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Confer- ence on Computer and Communications Security (CCS ’16), 2016. URLhttps://arxiv.org/abs/ 1607.00133

work page arXiv 2016
[30]

TRL: Transformer reinforce- ment learning, 2020

Leandro von Werra, Younes Belkada, Lewis Tun- stall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. TRL: Transformer reinforce- ment learning, 2020. URLhttps://github.com/ huggingface/trl

2020
[31]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Ste- fano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your lan- guage model is secretly a reward model. InAd- vances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023. URL https://arxiv. org/abs/2305.18290

work page internal anchor Pith review arXiv 2023
[32]

arXiv preprint arXiv:1909.00161 , year=

Wenpeng Yin, Jamaal Hay, and Dan Roth. Bench- marking zero-shot text classification: Datasets, eval- uation and entailment approach. InProceedings of the 2019 Conference on Empirical Methods in Nat- ural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3914–3923, 2019. URL https://arxiv.or...

work page arXiv 2019
[33]

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Mar- jan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettle- moyer. BART: Denoising sequence-to-sequence pre- training for natural language generation, transla- tion, and comprehension. InProceedings of the 58th Annual Meeting of the Association for Com- putational Linguistics (ACL 2020), ...

work page internal anchor Pith review arXiv 2020
[34]

Privacy amplification by subsampling: Tight anal- yses via couplings and divergences.Advances in Neural Information Processing Systems, 31, 2018

Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight anal- yses via couplings and divergences.Advances in Neural Information Processing Systems, 31, 2018

2018
[35]

Rényi differential privacy

Ilya Mironov. Rényi differential privacy. In2017 IEEE 30th Computer Security Foundations Sympo- sium (CSF), pages 263–275. IEEE, 2017. 9

2017