arxiv: 2605.13265 · v1 · submitted 2026-05-13 · 💻 cs.LG

Recognition: unknown

LightSplit: Practical Privacy-Preserving Split Learning via Orthogonal Projections

Mert Cihangiroglu , Alessandro Pegoraro , Phillip Rieger , Antonino Nocera , Ahmad-Reza Sadeghi

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:42 UTC · model grok-4.3

classification 💻 cs.LG

keywords split learningorthogonal projectionsprivacy preservationinformation bottleneckcommunication reductionreconstruction attacksdistributed machine learning

0 comments

The pith

LightSplit uses a fixed orthogonal random projection at the cut layer to cut transmitted dimensionality by up to 32 times while retaining more than 95% of baseline accuracy in split learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to make split learning more practical by reducing both the amount of data sent between client and server and the risk of attackers reconstructing private inputs. It does this by adding a simple fixed orthogonal random projection right at the cut layer, which shrinks the activations and creates an irreversible information loss based on information theory principles. This keeps the client lightweight, requires no architecture changes on the server, and allows normal gradient flow. A reader would care because current split learning methods either use too much bandwidth or add complex privacy fixes that hurt performance or compatibility. Tests on standard benchmarks show the accuracy stays high even with big reductions in data size, for both balanced and unbalanced data distributions.

Core claim

LightSplit limits information exposure and communication overhead in split learning by applying a lightweight fixed orthogonal random projection at the cut layer. This projection acts as an information bottleneck based on Shannon's information theory, restricting instance-specific information and suppressing exploitable per-sample signals. Transmitting the low-dimensional projections allows the server to operate on lifted representations without architectural modifications. The projection is non-invertible, so part of the original representation is irreversibly discarded at the client, reducing the information available for reconstruction. The method preserves end-to-end differentiabilityvia

What carries the argument

The fixed orthogonal random projection applied at the cut layer, which functions as a non-invertible information bottleneck to reduce dimensionality and limit instance-specific information in the transmitted data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may enable split learning deployments on bandwidth-limited edge devices by allowing flexible reduction ratios.
Future work could test whether varying the projection matrix per round adds more privacy without losing the fixed lightweight property.
Similar projection bottlenecks might apply to other collaborative learning methods facing communication and privacy trade-offs.
Real-world validation on hardware with actual network constraints would show if the 32x reduction translates to practical speedups.

Load-bearing premise

A fixed orthogonal random projection alone is sufficient to restrict instance-specific information enough to prevent reconstruction attacks, without needing sparsification, noise, or other additional mechanisms.

What would settle it

A reconstruction attack succeeding in recovering a significant portion of the original activations or input data from the low-dimensional projected representations at the server.

Figures

Figures reproduced from arXiv: 2605.13265 by Ahmad-Reza Sadeghi, Alessandro Pegoraro, Antonino Nocera, Mert Cihangiroglu, Phillip Rieger.

**Figure 1.** Figure 1: Vanilla and U-shaped split-learning topologies. II. BACKGROUND A. Split Learning In split learning (SL) [19], a DNN is partitioned at a designated cut layer between a client holding privacy-sensitive data and a server with strong computational resources. In the simplest configuration (vanilla), the client holds the first layers f, and the server holds the remainder g, including the classifier. Therefore, t… view at source ↗

**Figure 2.** Figure 2: Overview of LightSplit for U-shaped SL setup with [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Relation between the cut-layer activation, its projected [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Utility behaviour of LightSplit for L2H cut scenario. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: USL SDAR reconstruction degradation relative to R [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Reconstruction-attack comparison across three attacks and three datasets. Each column represent individual samples. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 8.** Figure 8: Impact of MLP lift-back size on accuracy gap (L2H [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 7.** Figure 7: Backbone-output cosine signal cos(¯ri , r¯). Except varied parameters, the remaining parameters are set to default values. 256 512 1,024 2,048 −8 −4 0 4 8 (a) CIFAR-100 (d=4096, k=512) 256 512 1,024 2,048 −4 0 4 8 (b) CIFAR-10 (d=4096, k=512) 64 128 256 512 1,024 −1.5 0 1 2 3 MLP hidden width m (c) FashionMNIST (d=2880, k=360) 64 128 256 512 1,024 −4 −2 0 2 4 6 MLP hidden width m (d) GTSRB (d=5684, k=710) … view at source ↗

**Figure 9.** Figure 9: Server-input cosine signal cos(¯zi , z¯). Malicious-client |MAD-z-score| vs. each ablation axis; in every panel the swept parameter is varied while the rest are held at the defaults reported in [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: SDAR level-4 USL reconstructions on CIFAR-10 at [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Level-4 USL, CR=16; same layout as [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Level-4 USL, CR=32; same layout as [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 15.** Figure 15: Level-4 VSL, CR=32; same layout as [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

**Figure 16.** Figure 16: Level-7 USL, CR=8; same layout as [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Level-7 USL, CR=16; same layout as [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 22.** Figure 22: FORA white-box reference-decoder reconstructions [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗

**Figure 23.** Figure 23: FORA white-box reference-decoder reconstructions [PITH_FULL_IMAGE:figures/full_fig_p022_23.png] view at source ↗

**Figure 24.** Figure 24: FORA white-box reference-decoder reconstructions [PITH_FULL_IMAGE:figures/full_fig_p023_24.png] view at source ↗

read the original abstract

Split learning (SL) enables collaborative training by partitioning a neural network across clients and a central server, but the cut-layer interface introduces a key challenge: high-dimensional activations incur substantial communication overhead while exposing representations vulnerable to reconstruction attacks. Existing approaches typically address efficiency or privacy in isolation, relying on additional mechanisms such as sparsification, quantization, or noise injection. We propose LightSplit, which limits information exposure and reduces communication overhead by applying a lightweight fixed orthogonal random projection at the cut layer. Based on Shannon's information theory, this projection acts as an information bottleneck that restricts instance-specific information and suppresses exploitable per-sample signals. By transmitting low-dimensional projections instead of raw activations, the server operates on lifted representations without requiring architectural modifications, ensuring compatibility with existing SL architectures. By avoiding additional trainable components on the client, the method remains lightweight and suitable for edge devices while preserving end-to-end differentiability via exact gradient propagation. As the projection is non-invertible, part of the original representation is irreversibly discarded at the client, LightSplit reduces the information available for reconstruction and limits information exposure. We extensively evaluate LightSplit on state-of-the-art benchmarks in both IID and non-IID settings across varying projection dimensions and client scales. Our results show that the method retains more than 95% of the baseline accuracy at up to 32x reduction in transmitted dimensionality while maintaining stable training dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LightSplit adds a fixed orthogonal random projection at the split cut to cut communication 32x with >95% accuracy retention, but the privacy guarantee rests on an untested assumption that the projection alone blocks reconstruction.

read the letter

The core move is straightforward: insert a fixed orthogonal random projection at the cut layer so the client sends a much smaller vector instead of the full activation. This gives the reported 32x drop in transmitted size while accuracy stays above 95% of baseline on the usual image benchmarks, and training stays stable in both IID and non-IID partitions. No extra trainable layers on the client, exact gradient flow back, and no architecture changes needed on the server side. That combination is the actual new piece relative to prior split-learning work that usually adds sparsification or noise separately.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LightSplit, a split-learning method that inserts a fixed orthogonal random projection at the cut layer. This projection is claimed to serve as a lightweight information bottleneck that simultaneously reduces transmitted dimensionality by up to 32× and limits instance-specific information leakage, while preserving end-to-end differentiability and retaining more than 95 % of baseline accuracy on standard benchmarks under both IID and non-IID partitions.

Significance. If the performance and privacy claims are substantiated, the approach would supply a parameter-free, client-lightweight defense that avoids noise, sparsification, or extra trainable modules, thereby improving the practicality of split learning on edge devices. The explicit appeal to Shannon-theoretic irreversibility and the reported stability across client scales constitute the main potential contributions.

major comments (2)

[§4] §4 (Experimental Evaluation): the central claim that >95 % baseline accuracy is retained at 32× dimensionality reduction is presented without reported baselines, number of random seeds, error bars, or precise non-IID partitioning protocol, rendering the quantitative result unverifiable.
[§3.2] §3.2 (Privacy Argument): the assertion that a single fixed orthogonal projection suffices to block reconstruction attacks rests solely on non-invertibility and an information-theoretic motivation; no mutual-information bounds, reconstruction-error metrics, or adversarial attack results are supplied, even though the accuracy claim implies that task-relevant features remain recoverable by the server.

minor comments (2)

[Abstract] The abstract states that experiments cover 'varying projection dimensions and client scales' yet supplies no table or figure reference for these dimensions; a summary table would improve clarity.
[§3.1] Notation for the projection matrix (e.g., whether it is drawn once and fixed or re-sampled per round) is introduced without an explicit equation number, complicating reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve verifiability and substantiation of the claims.

read point-by-point responses

Referee: [§4] §4 (Experimental Evaluation): the central claim that >95 % baseline accuracy is retained at 32× dimensionality reduction is presented without reported baselines, number of random seeds, error bars, or precise non-IID partitioning protocol, rendering the quantitative result unverifiable.

Authors: We agree that the experimental section should include explicit baseline accuracies, the number of random seeds, error bars, and a precise non-IID partitioning protocol to make the >95% retention claim verifiable. In the revised manuscript we will report the exact baseline accuracies, average results over at least five random seeds with standard-deviation error bars, and specify the non-IID partitioning method (e.g., Dirichlet concentration parameter). revision: yes
Referee: [§3.2] §3.2 (Privacy Argument): the assertion that a single fixed orthogonal projection suffices to block reconstruction attacks rests solely on non-invertibility and an information-theoretic motivation; no mutual-information bounds, reconstruction-error metrics, or adversarial attack results are supplied, even though the accuracy claim implies that task-relevant features remain recoverable by the server.

Authors: The core privacy argument relies on the non-invertibility of the fixed orthogonal projection, which irreversibly discards instance-specific information according to Shannon theory. We acknowledge that empirical support is currently limited. In the revision we will add reconstruction-error metrics (MSE between original and reconstructed activations) and results from reconstruction attacks to quantify leakage. Full mutual-information bounds are computationally intractable for high-dimensional activations, but we will include empirical estimates; the preservation of task-relevant features is intentional and does not contradict the reduction in instance-specific information. revision: partial

Circularity Check

0 steps flagged

No circularity; proposal is empirical with information-theoretic motivation

full rationale

The paper introduces LightSplit by applying a fixed orthogonal random projection at the cut layer, motivated by Shannon's information theory as creating an irreversible information bottleneck. No derivation chain reduces any claim to fitted parameters, self-citations, or self-definitional loops. The accuracy retention result (>95% at 32x reduction) is presented as an empirical outcome from benchmarks in IID/non-IID settings, not as a prediction forced by construction from inputs. Privacy is argued heuristically from non-invertibility without quantitative mutual-information bounds or self-referential uniqueness theorems. The method is self-contained against external benchmarks and does not rename known results or smuggle ansatzes via citations. This is a standard empirical proposal without circular structure.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that orthogonal projections act as effective non-invertible bottlenecks per Shannon theory, plus the free parameter of chosen projection dimension.

free parameters (1)

projection dimension
Selected to achieve target reduction ratios such as 32x; directly controls transmitted size and retained accuracy.

axioms (1)

domain assumption A fixed orthogonal random projection at the cut layer restricts instance-specific information and suppresses per-sample signals.
Invoked via reference to Shannon's information theory to justify the privacy benefit.

pith-pipeline@v0.9.0 · 5565 in / 1274 out tokens · 48117 ms · 2026-05-14T19:42:52.093627+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Deep learning with differential privacy

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InCCS, 2016

2016
[2]

CONTRA: Defending against poisoning attacks in federated learning

Sana Awan, Bo Luo, and Fengjun Li. CONTRA: Defending against poisoning attacks in federated learning. InESORICS, volume 12972. Springer, 2021

2021
[3]

The Johnson–Lindenstrauss transform itself preserves differential privacy

Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson–Lindenstrauss transform itself preserves differential privacy. In FOCS, 2012

2012
[4]

Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. Com- pressed sensing using generative models. InICML, 2017. 13

2017
[5]

FLTrust: Byzantine-robust federated learning via trust bootstrapping

Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. FLTrust: Byzantine-robust federated learning via trust bootstrapping. In NDSS, 2021

2021
[6]

The Secret Sharer: Evaluating and testing unintended memoriza- tion in neural networks

Nicholas Carlini, Chang Liu, ´Ulfar Erlingsson, Jernej Kos, and Dawn Song. The Secret Sharer: Evaluating and testing unintended memoriza- tion in neural networks. InUSENIX Security, 2019

2019
[7]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2 edition, 2006

2006
[8]

Ercument Cicek

Ege Erdogan, Alptekin Kupcu, and A. Ercument Cicek. SplitGuard: Detecting and mitigating training-hijacking attacks in split learning. In WPES, 2022

2022
[9]

Ercument Cicek

Ege Erdogan, Alptekin Kupcu, and A. Ercument Cicek. UnSplit: Data- oblivious model inversion, model stealing, and label inference attacks against split learning. InWorkshop on Privacy in the Electronic Society, 2022

2022
[10]

Fangcheng Fu, Xuanyu Wang, Jiawei Jiang, Huanran Xue, and Bui Cui. ProjPert: Projection-Based Perturbation for Label Protection in Split Learning Based Vertical Federated Learning .IEEE Transactions on Knowledge & Data Engineering, 36(07):3417–3428, July 2024

2024
[11]

Focusing on pinocchio’s nose: A gradients scrutinizer to thwart split-learning hijacking attacks using intrinsic attributes

Jiayun Fu, Xiaojing Ma, Bin B Zhu, Pingyi Hu, Ruixin Zhao, Yaru Jia, Peng Xu, Hai Jin, and Dongmei Zhang. Focusing on pinocchio’s nose: A gradients scrutinizer to thwart split-learning hijacking attacks using intrinsic attributes. InNDSS, 2023

2023
[12]

Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. The limitations of federated learning in sybil settings. InInternational Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2020

2020
[13]

Alireza Furutanpey, Philipp Raith, and Schahram Dustdar. Frankensplit: Efficient neural feature compression with shallow variational bottleneck injection for mobile edge computing.IEEE Transactions on Mobile Computing, 23(12):10770–10786, 2024

2024
[14]

PCAT: Functionality and data stealing from split learning by Pseudo-Client attack

Xinben Gao and Lan Zhang. PCAT: Functionality and data stealing from split learning by Pseudo-Client attack. InUSENIX Security. USENIX Association, 2023

2023
[15]

Feature space hijacking attacks against differentially private split learning.arXiv preprint arXiv:2201.04018, 2022

Grzegorz Gawron and Philip Stubbings. Feature space hijacking attacks against differentially private split learning.arXiv preprint arXiv:2201.04018, 2022

work page arXiv 2022
[16]

Inverting gradients-how easy is it to break privacy in federated learning?NeurIPS, 2020

Jonas Geiping, Hartmut Bauermeister, Hannah Dr ¨oge, and Michael Moeller. Inverting gradients-how easy is it to break privacy in federated learning?NeurIPS, 2020

2020
[17]

CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. InICML, 2016

2016
[18]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018

Otkrist Gupta and Ramesh Raskar. Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018

2018
[20]

Yuze Han, Xiang Li, Shiyun Lin, and Zhihua Zhang. A random projection approach to personalized federated learning: Enhancing com- munication efficiency, robustness, and fairness.Journal of Machine Learning Research, 25(380):1–88, 2024

2024
[21]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016

2016
[22]

Zecheng He, Tianwei Zhang, and Ruby B. Lee. Model inversion attacks against collaborative inference. InACSAC, 2019

2019
[23]

C3-SL: Circular convolution-based batch-wise compression for communication-efficient split learning

Cheng-Yen Hsieh, Yu-Chuan Chuang, and An-Yeu Wu. C3-SL: Circular convolution-based batch-wise compression for communication-efficient split learning. InIEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2022

2022
[24]

Extensions of lipschitz maps into a hilbert space.Contemporary Mathematics, 26:189–206, 01 1984

William Johnson and Joram Lindenstrauss. Extensions of lipschitz maps into a hilbert space.Contemporary Mathematics, 26:189–206, 01 1984

1984
[25]

Privacy via the Johnson–Lindenstrauss transform.Journal of Privacy and Confidentiality, 5(1), 2013

Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the Johnson–Lindenstrauss transform.Journal of Privacy and Confidentiality, 5(1), 2013

2013
[26]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

2015
[27]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

2009
[28]

Lecun, L

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278– 2324, 1998

1998
[29]

Label leakage and protection in two-party split learning

Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, and Chong Wang. Label leakage and protection in two-party split learning. InICLR, 2022

2022
[30]

FedVS: Straggler-resilient and privacy-preserving vertical federated learning for split models

Songze Li, Duanyi Yao, and Jin Liu. FedVS: Straggler-resilient and privacy-preserving vertical federated learning for split models. InICML, 2023

2023
[31]

Similarity-based label inference attack against training and inference of split learning

Junlin Liu, Xinchen Lyu, Qimei Cui, and Xiaofeng Tao. Similarity-based label inference attack against training and inference of split learning. IEEE Transactions on Information Forensics and Security, 19:2881– 2895, 2024

2024
[32]

Ranpac: Random projections and pre-trained models for continual learning

Mark D McDonnell, Dong Gong, Amin Parvaneh, Ehsan Abbasnejad, and Anton Van den Hengel. Ranpac: Random projections and pre-trained models for continual learning. InNeurIPS, 2023

2023
[33]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InInternational Conference on Artificial Intelligence and Statistics (AISTATS). Pmlr, 2017

2017
[34]

Shredder: Learning noise distributions to protect inference privacy

Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ram- rakhyani, Ali Jalali, Dean Tullsen, and Hadi Esmaeilzadeh. Shredder: Learning noise distributions to protect inference privacy. InInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020

2020
[35]

Communication-efficient split learning via adaptive feature-wise com- pression.IEEE Transactions on Neural Networks and Learning Systems, PP:1–15, 01 2025

Yongjeong Oh, Jaeho Lee, Christopher Brinton, and Yo-Seb Jeon. Communication-efficient split learning via adaptive feature-wise com- pression.IEEE Transactions on Neural Networks and Learning Systems, PP:1–15, 01 2025

2025
[36]

SafeSplit: A novel defense against client-side backdoor attacks in split learning

Phillip Rieger, Alessandro Pegoraro, Kavita Kumari, Tigist Abera, Jonathan Knauer, and Ahmad-Reza Sadeghi. SafeSplit: A novel defense against client-side backdoor attacks in split learning. InNDSS, 2025

2025
[37]

Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems

Jiawei Shao and Jun Zhang. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. InIEEE Inter- national Conference on Communications Workshops (ICC Workshops), 2020

2020
[38]

DISCO: Dynamic and invariant sensitive channel obfuscation for deep neural networks

Abhishek Singh, Ayush Chopra, Ethan Garza, Emily Zhang, Praneeth Vepakomma, Vivek Sharma, and Ramesh Raskar. DISCO: Dynamic and invariant sensitive channel obfuscation for deep neural networks. In CVPR, 2021

2021
[39]

The german traffic sign recognition benchmark: a multi-class classifica- tion competition

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The german traffic sign recognition benchmark: a multi-class classifica- tion competition. Ininternational joint conference on neural networks. IEEE, 2011

2011
[40]

Splitfed: When federated learning meets split learning

Chandra Thapa, Pathum Chamikara Mahawaga Arachchige, Seyit Camtepe, and Lichao Sun. Splitfed: When federated learning meets split learning. InAAAI, 2022

2022
[41]

Split learning for health: Distributed deep learning without sharing raw patient data

Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. Split learning for health: Distributed deep learning without sharing raw patient data.arXiv preprint arXiv:1812.00564, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Nopeek: Information leakage reduction to share activations in distributed deep learning

Praneeth Vepakomma, Abhishek Singh, Otkrist Gupta, and Ramesh Raskar. Nopeek: Information leakage reduction to share activations in distributed deep learning. InInternational Conference on Data Mining Workshops (ICDMW). IEEE, 2020

2020
[43]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

2004
[44]

EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning .IEEE Transactions on Parallel & Distributed Systems, 35(03):377–390, March 2024

Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, and Blesson Varghese. EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning .IEEE Transactions on Parallel & Distributed Systems, 35(03):377–390, March 2024

2024
[45]

Split learning over wireless networks: Parallel design and resource management.IEEE Journal on Selected Areas in Communications, 41(4):1051–1066, 2023

Wen Wu, Mushu Li, Kaige Qu, Conghao Zhou, Xuemin Shen, Weihua Zhuang, Xu Li, and Weisen Shi. Split learning over wireless networks: Parallel design and resource management.IEEE Journal on Selected Areas in Communications, 41(4):1051–1066, 2023

2023
[46]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

A stealthy wrongdoer: Feature- oriented reconstruction attack against split learning

Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, and Yaxin Liu. A stealthy wrongdoer: Feature- oriented reconstruction attack against split learning. InCVPR, 2024

2024
[48]

Privacy-preserving split learning via patch shuffling over transformers

Dixi Yao, Liyao Xiang, Hengyuan Xu, Hangyu Ye, and Yingqi Chen. Privacy-preserving split learning via patch shuffling over transformers. InIEEE International Conference on Data Mining (ICDM), 2022

2022
[49]

Ginver: Generative model inversion attacks against collaborative inference

Yupeng Yin, Xianglong Zhang, Huanle Zhang, Feng Li, Yue Yu, Xiuzhen Cheng, and Pengfei Hu. Ginver: Generative model inversion attacks against collaborative inference. InWWW, 2023. 14

2023
[50]

How to backdoor split learning.Neural Networks, 168:326–336, 2023

Fangchao Yu, Lina Wang, Bo Zeng, Kai Zhao, Zhi Pang, and Tian Wu. How to backdoor split learning.Neural Networks, 168:326–336, 2023

2023
[51]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018

2018
[52]

idlg: Improved deep leakage from gradients.arXiv preprint arXiv:2001.02610, 2020

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. idlg: Improved deep leakage from gradients.arXiv preprint arXiv:2001.02610, 2020

work page arXiv 2001
[53]

Reducing communication for split learning by randomized top-k sparsification

Fei Zheng, Chaochao Chen, Lingjuan Lyu, and Binhui Yao. Reducing communication for split learning by randomized top-k sparsification. In International Joint Conference on Artificial Intelligence, 2023

2023
[54]

Mask-encoded sparsification: Mitigating biased gradients in communication-efficient split learning

Wenxuan Zhou, Zhihao Qu, Shen-Huan Lyu, Miao Cai, and Baoliu Ye. Mask-encoded sparsification: Mitigating biased gradients in communication-efficient split learning. InECAI - European Confer- ence on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications. IOS Press, 2024

2024
[55]

Deep leakage from gradients

Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. InNeurIPS, 2019

2019
[56]

Passive inference attacks on split learning via adversarial regularization

Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, and Beng Chin Ooi. Passive inference attacks on split learning via adversarial regularization. InNDSS, 2025. APPENDIX Our appendix completes the per-attack and per- configuration sweeps that the main eval cross-references, and gives the step-by-step training derivation that Sec. IV-D def...

2025
[57]

51 3 10 30 100 300 3σ |z| on cos(¯zi, ¯z) (a) α 5 10 20 50 (b) n
[58]

51 3 10 30 100 300 |z| on cos(¯zi, ¯z) (c) µ
[59]

10. 20. 3 0. 5 0. 8 (d) p Raw Learned 1×1 LightSplit-F LightSplit-L Fig. 9:Server-input cosine signalcos(¯z i,¯z).Malicious-client |MAD-z-score|vs. each ablation axis; in every panel the swept parameter is varied while the rest are held at the defaults reported in Fig. 7. applied identically to the target path and to the simulator path so that RAWreduces ...

2048
[60]

We count only the bottleneck and lift-back overhead beyond the common split modelh◦g◦f

Computational complexity:Letbdenote the mini-batch size, let the cut activation have shapeC×H×Wand flat- tened dimensiond=CHW, letkbe the transmitted width (CR=d/k), and letmbe the hidden width of the learned lift-back. We count only the bottleneck and lift-back overhead beyond the common split modelh◦g◦f. a) Cost of LightSplit:Both LIGHTSPLITvariants sen...
[61]

b) Step 2 (client, projection):The client applies the fixed projection ˜zi =R ⊤zi ∈R k, following Eq

Forward pass: a) Step 1 (client, head):On a mini-batch{(x i, yi)}b i=1, the client computes the cut-layer activationz i =f(x i)∈R d. b) Step 2 (client, projection):The client applies the fixed projection ˜zi =R ⊤zi ∈R k, following Eq. (2). The resulting tensor is recorded in the client’s autograd graph as a function off’s parameters. c) Step 3 (network, c...
[62]

b) Step 12.:The client invokesloss.backward()on its local graph

Backward pass: client (Phase 1): a) Step 11:The client zeros its optimizer state with client_optimizer.zero_grad(). b) Step 12.:The client invokesloss.backward()on its local graph. Autograd traverses the two branches simulta- neously: •The CE branch reacheshand accumulates∂L CE/∂θh intoh’s.grad, then propagates back tou, leaving ∂LCE/∂uonu’s.grad. Because...
[63]

Backward pass: server: a) Step 14:The server zeros its optimizer state with server_optimizer.zero_grad(). Fig. 14: Level-4 VSL,CR=16; same layout as Fig. 10. Fig. 15: Level-4 VSL,CR=32; same layout as Fig. 10. b) Step 15:The server uses the incoming∂L CE/∂uas the upstream seed and callsu.backward(grad_output) on its server-side graph. Autograd traversesu→...
[64]

16: Level-7 USL,CR=8; same layout as Fig

Backward pass: client (Phase 2): a) Step 18.:The client receives∂L CE/∂˜zandaddsit to the contribution already sitting on ˜z’s.gradfrom Step 12, 20 Fig. 16: Level-7 USL,CR=8; same layout as Fig. 10. Fig. 17: Level-7 USL,CR=16; same layout as Fig. 10. yielding ∂L ∂˜z = ∂LCE ∂˜z +λ WCC ∂LWCC ∂˜z .(12) This client-local addition is the only point in the enti...
[65]

Each direction transmits exactly one tensor per sample

Wire-level and optimizer-level summary:The four mes- sages exchanged at the cut interface during a single mini-batch step are summarized in Table XI: the forward payload ˜z, the server’s return signalu, and the two backward gradient signals ∂LCE/∂uand∂L CE/∂˜z. Each direction transmits exactly one tensor per sample. The server’s update of(θ g, θϕ)depends ...
[66]

22: FORA white-box reference-decoder reconstructions on CIFAR-10 atCR=8×

Equivalence between single- and two-optimizer im- plementations:For any first-order optimizer (SGD, Adam, AdamW, etc.) applied parameter-wise, splitting a single loss.backward()/optimizer.step()call into two independent(zero_grad, backward, step)se- White-box target Ground truth Raw Learned 1×1 LS-F λwcc = 0 LS-F λwcc = 0.01 LS-F λwcc = 0.1 Fig. 22: FORA ...
[67]

Letθ f , θh, θg, θϕ denote the trainable parameters off, h, g, ϕ, respectively, and letJ F denote the Jacobian of a mapFat the current point

Compact derivation in equations:For a reader who prefers the gradient bookkeeping in equation form, this sub- section restates the per-step computation as a sequence of derivatives. Letθ f , θh, θg, θϕ denote the trainable parameters off, h, g, ϕ, respectively, and letJ F denote the Jacobian of a mapFat the current point. 22 White-box target Ground truth ...

work page arXiv 1920