Recognition: unknown
Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies
Pith reviewed 2026-05-09 22:23 UTC · model grok-4.3
The pith
A three-layer architecture keeps user data out of shared LLM weights so deleting a proxy artefact fully unlearns the user by simple removal.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The architecture decouples personal data from shared weights by routing user-specific behavior exclusively through deletable proxy artefacts while domain expertise lives in composable LoRA adapters. Evaluation on Phi-3.5-mini and Llama-3.1-8B shows that personal data influences outputs during use yet produces near-baseline behavior after proxy deletion, with negligible cross-user contamination. This structure converts machine unlearning from an intractable weight-editing task into a deterministic deletion operation that preserves personalization and privacy guarantees.
What carries the argument
The three-layer separable expert architecture: a static base model, composable domain-expert LoRA adapters that shape behavior without carrying user data, and per-user proxy artefacts whose deletion performs unlearning.
If this is right
- User-specific personalization occurs without any user data entering shared model weights or adapters.
- Machine unlearning reduces to deterministic deletion of the proxy artefact rather than retraining or weight editing.
- Privacy attacks such as model inversion and membership inference are blocked against shared components by construction.
- The design remains compatible with differentially private stochastic gradient descent on the shared base and adapters.
- Near-zero cross-user contamination is observed alongside high verification rates that the original user data is no longer present after deletion.
Where Pith is reading between the lines
- The same proxy mechanism could support selective forgetting of individual facts within a user's data rather than entire user profiles.
- Composable adapters might allow domain experts to be shared across users while still keeping each user's personal data isolated.
- The architecture could be extended to multimodal models where image or audio user data is likewise confined to deletable proxies.
- Verification rates and KL divergence after deletion provide a practical test that could be applied to other unlearning methods for comparison.
Load-bearing premise
That the per-user proxy artefacts can be kept fully isolated from the base model and adapters so that their deletion leaves no residual influence or leakage.
What would settle it
A test that measures whether, after proxy deletion, the model still produces outputs that encode the deleted user's specific facts or preferences at rates significantly above the no-proxy baseline.
Figures
read the original abstract
Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-layer architecture that decouples personal data from shared weights by combining a static base model, composable domain-expert LoRA adapters that shape behavior without imparting user data, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B confirms per-user differentiation in which personal data influences outputs while remaining isolated, verified by a return to baseline after proxy removal (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) and near-zero cross-user contamination. Because user-specific information never enters shared weights, the architecture mitigates model inversion, membership inference, and training-data extraction against shared model components by construction. The approach converts machine unlearning from an intractable weight-editing problem into a deterministic deletion operation that preserves personalization alongside privacy-enhancing guarantees and is compatible with differentially private stochastic gradient descent (DP-SGD) for privacy-preserving shared model improvement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a three-layer Separable Expert Architecture for privacy-preserving LLM personalization: a static base model, composable domain-expert LoRA adapters that shape behavior without incorporating user data, and per-user proxy artefacts. Deletion of the proxies constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B reports per-user differentiation, return to baseline after proxy removal (KL divergence ~0.21 nats, 82-89% verification pass rate), near-zero cross-user contamination, and mitigation of model inversion, membership inference, and training-data extraction attacks against shared components by construction. The approach is compatible with DP-SGD.
Significance. If the claimed isolation holds, this architecture offers a practical advance in machine unlearning for LLMs by reducing it to a deterministic deletion operation while retaining personalization. The separation of user data from shared weights provides privacy guarantees by design and could influence future personalized AI systems. The compatibility with DP-SGD and use of LoRA for domain expertise without user data injection are notable strengths.
major comments (2)
- [Evaluation] Evaluation section: The reported return to baseline (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) after proxy removal supports the unlearning claim, but the section lacks baselines from prior unlearning methods, ablation studies on proxy construction, and full controls for residual influence or leakage, which are load-bearing for verifying complete isolation of per-user artefacts.
- [Architecture] Architecture description: The central 'by construction' mitigation relies on user-specific information never entering shared weights via the three-layer separation, but there is no formal specification, equation, or pseudocode defining the proxy artefacts' integration and isolation from the base model and adapters.
minor comments (1)
- [Abstract] Abstract: The claim of 'near-zero cross-user contamination' is stated without specifying the exact metric, threshold, or measurement method used.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the constructive feedback. We address each major comment below, proposing targeted revisions to strengthen the manuscript while preserving its core contributions.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The reported return to baseline (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) after proxy removal supports the unlearning claim, but the section lacks baselines from prior unlearning methods, ablation studies on proxy construction, and full controls for residual influence or leakage, which are load-bearing for verifying complete isolation of per-user artefacts.
Authors: We agree that additional context would strengthen the claims. In revision, we will add a dedicated paragraph comparing our deterministic deletion to approximate unlearning baselines (e.g., gradient-ascent or scrubbing methods), emphasizing that our approach achieves exact removal at negligible cost without retraining. We will also include ablation results on proxy construction (varying dimension and layer count) and expand the controls with statistical tests on output distributions across prompt sets to further verify isolation. These additions can be incorporated without new large-scale experiments. revision: partial
-
Referee: [Architecture] Architecture description: The central 'by construction' mitigation relies on user-specific information never entering shared weights via the three-layer separation, but there is no formal specification, equation, or pseudocode defining the proxy artefacts' integration and isolation from the base model and adapters.
Authors: We accept this point and will revise the architecture section to include formal specification. We will add pseudocode for the forward pass (output = base(x) + sum LoRA adapters(x) + proxy_u(x)) and an equation formalizing isolation: user data influences only the separately stored proxy parameters, whose deletion removes all traces without touching shared weights. This will explicitly substantiate the privacy guarantees by construction. revision: yes
Circularity Check
No significant circularity; central guarantee follows directly from architectural separation
full rationale
The paper's core claim—that user data never enters shared weights and deletion of proxies constitutes deterministic unlearning—follows directly from the explicit three-layer design (static base model + composable domain-expert LoRA adapters + deletable per-user proxies) rather than from any fitted parameters, self-referential predictions, or load-bearing self-citations. The abstract and description present this as a definitional property of the separation, with KL divergence and verification rates offered only as empirical confirmation of isolation, not as the derivation itself. No equations reduce to inputs by construction, no uniqueness theorems are imported from prior author work, and no ansatz or renaming of known results is smuggled in. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
invented entities (1)
-
per-user proxy artefacts
no independent evidence
Reference graph
Works this paper leans on
-
[1]
LaMP: When large language models meet personalization.arXiv preprint arXiv:2304.11406, 2023
Alireza Salemi, Sheshera Mysore, Michael Bender- sky, and Hamed Zamani. Lamp: When large lan- guage models meet personalization. InProceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024. URL https://arxiv.org/abs/2304.11406
-
[2]
Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, and Prithviraj Ammanabrolu. Personalized soups: Personalized large language model alignment via post-hoc pa- rameter merging. InAdvances in Neural Infor- mation Processing Systems, 2023. URL https: //arxiv.org/abs/2310.11564
-
[3]
arXiv preprint arXiv:2402.05133 , year=
Xinyu Li, Ruiyang Zhou, Zachary C. Lipton, and Leqi Liu. Personalized language modeling from personalized human feedback.arXiv preprint arXiv:2402.05133, 2024. URL https://arxiv. org/abs/2402.05133
-
[4]
Personalizing reinforcement learning from human feedback with variational preference learning
Sriyash Poddar, Yanming Wan, Hamish Ivison, Ab- hishek Gupta, and Natasha Jaques. Personalizing reinforcement learning from human feedback with variational preference learning. InAdvances in Neu- ral Information Processing Systems 37 (NeurIPS 2024), 2024. URL https://arxiv.org/abs/2408. 10075
2024
-
[5]
Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot
Lucas Bourtoule, Varun Chandrasekaran, Christo- pher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Pa- pernot. Machine unlearning. In2021 IEEE Sym- posium on Security and Privacy (SP), 2021. URL https://arxiv.org/abs/1912.03817
-
[6]
Eternal sunshine of the spotless net: Se- lective forgetting in deep networks
Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Se- lective forgetting in deep networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9304–9312,
- [7]
-
[8]
Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catas- trophic collapse to effective unlearning. InConfer- 7 ence on Language Modeling (COLM 2024), 2024. URLhttps://arxiv.org/abs/2404.05868
-
[9]
Nathaniel Li, Alexander Pan, Anjali Gopal, Sum- mer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xi- aoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ar...
-
[10]
Model inversion attacks that exploit confi- dence information and basic countermeasures
Matt Fredrikson, Somesh Jha, and Thomas Risten- part. Model inversion attacks that exploit confi- dence information and basic countermeasures. In Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security (CCS ’15), 2015. doi: 10.1145/2810103.2813677
-
[11]
Extracting training data from large language models
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Ex- tracting training data from large language models. In30th USENIX Security Symposium, 2021. URL https://arxiv.org/abs/2012.07805
- [13]
-
[14]
Membership Inference Attacks Against Machine Learning Models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. doi: 10.1109/SP.2017.41. URL https://arxiv.org/abs/1610.05820
-
[15]
Making ai forget you: Data deletion in machine learning,
Antonio Ginart, Melody Y. Guan, Gregory Valiant, and James Zou. Making AI forget you: Data dele- tion in machine learning. InAdvances in Neural Information Processing Systems (NeurIPS), vol- ume 32, 2019. URL https://arxiv.org/abs/ 1907.05012
-
[16]
Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 35, pages 11516–11524, 2021. URLhttps: //arxiv.org/abs/2010.10981
-
[17]
Who’s harry potter? approximate unlearning in llms,
Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in LLMs. InInter- national Conference on Learning Representations (ICLR 2024), 2024. URL https://arxiv.org/ abs/2310.02238
-
[18]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations (ICLR 2022), 2022. URL https://arxiv.org/abs/2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InAdvances in Neural Informa- tion Processing Systems 36 (NeurIPS 2023), 2023. URLhttps://arxiv.org/abs/2305.14314
work page internal anchor Pith review arXiv 2023
-
[20]
Lorahub: Efficient cross-task generalization via dynamic lora composition,
Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, and Min Lin. LoraHub: Ef- ficient cross-task generalization via dynamic LoRA composition. InConference on Language Modeling (COLM 2024), 2024. URL https://arxiv.org/ abs/2307.13269
-
[21]
arXiv preprint arXiv:2306.14870 , year=
Jinghan Zhang, Shiqi Chen, Junteng Liu, and Junx- ian He. Composing parameter-efficient modules with arithmetic operations. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. URLhttps://arxiv.org/abs/2306.14870
-
[22]
Editing Models with Task Arithmetic
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Edit- ing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022. URL https://arxiv. org/abs/2212.04089
work page internal anchor Pith review arXiv 2022
-
[23]
S-lora: Serving thousands of concurrent lora adapters,
Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, and Ion Stoica. S-LoRA: Serv- ing thousands of concurrent LoRA adapters. In Proceedings of Machine Learning and Systems 6 (MLSys 2024), 2024. URL https://arxiv.org/ abs/2311.03285
-
[24]
20 StevenChiang, YiwenLu, QihanLiu, AndrewChen, PonyMa, andMindLab
Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, and Arvind Krishnamurthy. Punica: Multi-tenant LoRA serving. InProceedings of Ma- chine Learning and Systems 6 (MLSys 2024), 2024. URLhttps://arxiv.org/abs/2310.18547
-
[25]
Steering Llama 2 via contrastive activation addition
Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. Steering Llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL),
-
[26]
URL https://arxiv.org/abs/2312.06681. 8
work page internal anchor Pith review arXiv
-
[27]
Inference-time intervention: Eliciting truthful an- swers from a language model
Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Inference-time intervention: Eliciting truthful an- swers from a language model. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023),
2023
- [28]
-
[29]
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , pages =
Martín Abadi, Andy Chu, Ian Goodfellow, H. Bren- dan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Confer- ence on Computer and Communications Security (CCS ’16), 2016. URLhttps://arxiv.org/abs/ 1607.00133
-
[30]
TRL: Transformer reinforce- ment learning, 2020
Leandro von Werra, Younes Belkada, Lewis Tun- stall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. TRL: Transformer reinforce- ment learning, 2020. URLhttps://github.com/ huggingface/trl
2020
-
[31]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Ste- fano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your lan- guage model is secretly a reward model. InAd- vances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023. URL https://arxiv. org/abs/2305.18290
work page internal anchor Pith review arXiv 2023
-
[32]
arXiv preprint arXiv:1909.00161 , year=
Wenpeng Yin, Jamaal Hay, and Dan Roth. Bench- marking zero-shot text classification: Datasets, eval- uation and entailment approach. InProceedings of the 2019 Conference on Empirical Methods in Nat- ural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3914–3923, 2019. URL https://arxiv.or...
-
[33]
Mike Lewis, Yinhan Liu, Naman Goyal, Mar- jan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettle- moyer. BART: Denoising sequence-to-sequence pre- training for natural language generation, transla- tion, and comprehension. InProceedings of the 58th Annual Meeting of the Association for Com- putational Linguistics (ACL 2020), ...
work page internal anchor Pith review arXiv 2020
-
[34]
Privacy amplification by subsampling: Tight anal- yses via couplings and divergences.Advances in Neural Information Processing Systems, 31, 2018
Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight anal- yses via couplings and divergences.Advances in Neural Information Processing Systems, 31, 2018
2018
-
[35]
Rényi differential privacy
Ilya Mironov. Rényi differential privacy. In2017 IEEE 30th Computer Security Foundations Sympo- sium (CSF), pages 263–275. IEEE, 2017. 9
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.