Recognition: unknown
Chamelio: A Fast Shared Cloud Network Stack for Isolated Tenant-Defined Protocols
Pith reviewed 2026-05-08 09:44 UTC · model grok-4.3
The pith
Chamelio lets tenants define custom network protocols in a shared cloud stack while matching fixed-stack performance and enforcing isolation via bounded fast paths.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Chamelio is a programmable shared network stack that lets tenants implement full protocols through a bounded eBPF fast path and a tenant slow path, while approaching the performance and preserving the strong isolation of fixed shared stacks. It combines three ideas: a shared-stack architecture for tenant-defined protocols; joint optimisation of tenant handlers with provider infrastructure and co-resident tenants in the shared fast path; and a bounded fast path contract with runtime cycle accounting that keeps tenant programmability compatible with strong performance isolation.
What carries the argument
The bounded fast path contract with runtime cycle accounting, which limits per-packet work and tracks cycles to maintain isolation among co-resident tenant protocols.
If this is right
- A tenant programmable TCP reaches 9.2 million requests per second, matching the performance of the hand-tuned TAS stack.
- Joint compilation of tenant handlers with the infrastructure reduces the programmability tax from 23.9 percent to 3.8 percent.
- Under a scaling TCP adversary that drives uninstrumented stacks to 154 microseconds tail latency, Chamelio bounds the victim workload's tail latency at 46 microseconds.
- Tenant-defined protocols become viable in shared stacks without collapsing the performance benefits of collapsed layering.
Where Pith is reading between the lines
- The same bounded-contract and accounting technique could apply to other shared virtualized resources such as storage or compute queues.
- Tenants could safely experiment with application-specific protocol tweaks, such as custom congestion control, in production multi-tenant environments.
- The design opens a path to dynamic per-tenant bound tuning based on observed workload patterns rather than static limits.
Load-bearing premise
The bounded fast path contract together with runtime cycle accounting can enforce strong performance isolation for arbitrary tenant protocols without unacceptable overhead or allowing interference in tail latency.
What would settle it
Running a tenant protocol engineered to consume maximum allowed cycles per packet while measuring whether a co-resident latency-sensitive workload's tail latency stays at or below 46 microseconds.
Figures
read the original abstract
Conventional cloud network virtualization sends packets through multiple guest and host layers, inflating CPU cost and tail latency. Shared host datapaths collapse this layering into one optimized path across tenants, but existing shared stacks are fixed-function: tenants cannot specialize their protocols. eBPF is the natural vehicle for restoring programmability to a shared datapath, but today's extensions are hook-sized, and its verifier provides safety -- not performance isolation: one tenant's per-packet work can inflate every other tenant's tail latency. Chamelio is a programmable shared network stack that lets tenants implement full protocols through a bounded eBPF fast path and a tenant slow path, while approaching the performance and preserving the strong isolation of fixed shared stacks. It combines three ideas: a shared-stack architecture for tenant-defined protocols; joint optimisation of tenant handlers with provider infrastructure and co-resident tenants in the shared fast path; and a bounded fast path contract with runtime cycle accounting that keeps tenant programmability compatible with strong performance isolation. A tenant programmable TCP on Chamelio reaches 9.2 Mreq/s, matching the hand-tuned TAS stack; joint compilation shrinks the programmability tax from 23.9% to 3.8%; and under a scaling TCP adversary that drives uninstrumented stacks to 154 microseconds, Chamelio bounds victim tail latency at 46 microseconds.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Chamelio, a programmable shared cloud network stack enabling tenants to implement full protocols via a bounded eBPF fast path plus tenant slow path. It combines a shared-stack architecture, joint compilation of tenant handlers with provider code and co-resident tenants, and a bounded fast-path contract with runtime cycle accounting. Reported results include a tenant-programmable TCP reaching 9.2 Mreq/s (matching hand-tuned TAS), joint compilation reducing the programmability tax from 23.9% to 3.8%, and bounding victim tail latency at 46 μs under a scaling TCP adversary that inflates uninstrumented stacks to 154 μs.
Significance. If the measurements and isolation guarantees hold, the work is significant for cloud networking: it restores tenant programmability to high-performance shared datapaths while preserving the isolation properties of fixed-function stacks. The combination of joint optimization and cycle-accounting enforcement is a concrete step toward safe, efficient multi-tenant protocol specialization.
major comments (2)
- [§5] §5 (Evaluation), isolation experiment: the claim that the bounded fast-path contract plus cycle accounting enforces strong performance isolation for arbitrary tenant protocols rests on the reported 46 μs bound; however, the manuscript provides insufficient detail on the cycle-accounting implementation, the precise definition of the fast-path contract, and whether the accounting overhead itself was measured under worst-case tenant code, which is load-bearing for the central isolation claim.
- [§4] §4 (Design), joint compilation: the reduction of the programmability tax from 23.9% to 3.8% is attributed to joint optimization, but the paper does not quantify how much of the improvement comes from cross-tenant vs. tenant-provider co-optimization, nor does it show an ablation that isolates the contribution of each, weakening the attribution to the proposed technique.
minor comments (2)
- The abstract and introduction use 'joint optimisation' and 'joint compilation' interchangeably; consistent terminology and a short clarifying sentence would improve readability.
- [Evaluation] Figure captions in the evaluation section should explicitly state the hardware platform, request size distribution, and number of runs underlying the reported Mreq/s and latency numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of Chamelio's significance. We address each major comment below, providing clarifications where possible and committing to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation), isolation experiment: the claim that the bounded fast-path contract plus cycle accounting enforces strong performance isolation for arbitrary tenant protocols rests on the reported 46 μs bound; however, the manuscript provides insufficient detail on the cycle-accounting implementation, the precise definition of the fast-path contract, and whether the accounting overhead itself was measured under worst-case tenant code, which is load-bearing for the central isolation claim.
Authors: We agree that the current presentation of the isolation results would benefit from greater detail to make the central claim fully reproducible and convincing. In the revised manuscript we will expand §5 with (1) a precise, formal definition of the bounded fast-path contract (including per-packet cycle limits and the interface exposed to tenant handlers), (2) a description of the cycle-accounting implementation (how cycles are sampled, attributed, and enforced at runtime), and (3) additional micro-benchmarks that quantify the accounting overhead under worst-case tenant code patterns. These additions will directly support the reported 46 μs tail-latency bound. revision: yes
-
Referee: [§4] §4 (Design), joint compilation: the reduction of the programmability tax from 23.9% to 3.8% is attributed to joint optimization, but the paper does not quantify how much of the improvement comes from cross-tenant vs. tenant-provider co-optimization, nor does it show an ablation that isolates the contribution of each, weakening the attribution to the proposed technique.
Authors: Joint compilation in Chamelio simultaneously optimizes tenant code against both the provider infrastructure and co-resident tenants because all components share the same fast-path binary. While the overall 23.9 % → 3.8 % reduction is correctly reported, we acknowledge that an explicit ablation would strengthen the attribution. In the revised §4 we will add an ablation study that isolates the contribution of tenant-provider co-optimization from cross-tenant optimizations, allowing readers to see the incremental benefit of each. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents a systems implementation of a programmable shared network stack (Chamelio) and supports its claims exclusively through reported empirical measurements from benchmarks (e.g., 9.2 Mreq/s for tenant TCP, tail latency bounds under adversary). No mathematical derivations, equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The bounded fast-path contract and cycle accounting are architectural mechanisms evaluated via implementation results rather than reduced to inputs by construction. This is a standard non-circular empirical systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The design and imple- mentation of Hyperupcalls
Nadav Amit and Michael Wei. The design and imple- mentation of Hyperupcalls. In2018 USENIX Annual Technical Conference, ATC, 2018
2018
-
[2]
Accelerating nested virtualization with HyperTurtle
Ori Ben Zur, Jakob Krebs, Shai Aviram Bergman, and Mark Silberstein. Accelerating nested virtualization with HyperTurtle. In2025 USENIX Annual Technical Conference, ATC, 2025
2025
-
[3]
eTran: Extensible kernel transport with eBPF
Zhongjie Chen, Qingkai Meng, ChonLam Lao, Yifan Liu, Fengyuan Ren, Minlan Yu, and Yang Zhou. eTran: Extensible kernel transport with eBPF. In22nd USENIX Symposium on Networked Systems Design and Imple- mentation, NSDI, 2025
2025
-
[4]
Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization
Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexan- der Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krish- nan, Subbaiah Venkata, Yossi Richter, Uday...
2018
-
[5]
Towards structurally extensible host network stacks
Kumar Kartikeya Dwivedi, Rishabh Iyer, and Sanidhya Kashyap. Towards structurally extensible host network stacks. In24th ACM Workshop on Hot Topics in Networks, HotNets, 2025
2025
-
[6]
Iyer, and Sanid- hya Kashyap
Kumar Kartikeya Dwivedi, Rishabh R. Iyer, and Sanid- hya Kashyap. Fast, flexible, and practical kernel exten- sions. In30th ACM Symposium on Operating Systems Principles, SOSP, pages 249–264, 2024
2024
-
[7]
Making kernel bypass prac- tical for the cloud with junction
Joshua Fried, Gohar Irfan Chaudhry, Enrique Saurez, Esha Choukshe, Íñigo Goiri, Sameh Elnikety, Rodrigo Fonseca, and Adam Belay. Making kernel bypass prac- tical for the cloud with junction. In21st USENIX Sympo- sium on Networked Systems Design and Implementation, NSDI, 2024
2024
-
[8]
Simple and precise static analysis of untrusted Linux kernel extensions
Elazar Gershuni, Nadav Amit, Arie Gurfinkel, Nina Nar- odytska, Jorge Navas, Noam Rinetzky, Leonid Ryzhyk, and Mooly Sagiv. Simple and precise static analysis of untrusted Linux kernel extensions. In40th ACM Conference on Programming Language Design and Im- plementation, PLDI, 2019
2019
-
[9]
BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing
Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing. In18th USENIX Symposium on Networked Systems De- sign and Implementation, NSDI, 2021
2021
-
[10]
The express data path: fast programmable packet processing in the operating sys- tem kernel
Toke Høiland-Jørgensen, Jesper Dangaard Brouer, Daniel Borkmann, John Fastabend, Tom Herbert, David Ahern, and David Miller. The express data path: fast programmable packet processing in the operating sys- tem kernel. In14th International Conference on emerg- ing Networking EXperiments and Technologie, CoNEXT ’18, 2018
2018
-
[11]
PCI-SIG SR-IOV primer: An intro- duction to SR-IOV technology.Intel application note, January 2011
Intel Corporation. PCI-SIG SR-IOV primer: An intro- duction to SR-IOV technology.Intel application note, January 2011. Revision 2.5
2011
-
[12]
http://www.dpdk
Intel data plane development kit. http://www.dpdk. org/
-
[13]
https://www.qemu.org/docs/master/ system/devices/ivshmem.html
Inter-VM shared memory device – QEMU docu- mentation. https://www.qemu.org/docs/master/ system/devices/ivshmem.html
-
[14]
mTCP: A highly scalable user-level TCP stack for multicore systems
Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and Ky- oungSoo Park. mTCP: A highly scalable user-level TCP stack for multicore systems. In11th USENIX Sympo- sium on Networked Systems Design and Implementation, NSDI, 2014
2014
-
[15]
ThinLTO: Scalable and incremental LTO
Teresa Johnson, Mehdi Amini, and Xinliang David Li. ThinLTO: Scalable and incremental LTO. In2017 Inter- national Symposium on Code Generation and Optimiza- tion, CGO, 2017
2017
-
[16]
Evaluation of tail call costs in eBPF
Clément Joly and François Serman. Evaluation of tail call costs in eBPF. In2020 Linux Plumbers Conference, LPC, 2020
2020
-
[17]
Sharma, Arvind Krishnamurthy, and Thomas Anderson
Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. TAS: TCP acceleration as an OS service. In14th ACM European Conference on Computer Systems, EuroSys, 2019
2019
-
[18]
Frans Kaashoek
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. The Click modular router. ACM Transactions on Computer Systems, 18(3):263–297, August 2000
2000
-
[19]
Verified pro- grams can party: Optimizing kernel extensions via Post- Verification merging
Hsuan-Chi Kuo, Kai-Hsun Chen, Yicheng Lu, Dan Williams, Sibin Mohan, and Tianyin Xu. Verified pro- grams can party: Optimizing kernel extensions via Post- Verification merging. In17th ACM European Conference on Computer Systems, EuroSys, pages 283–299, 2022
2022
-
[20]
Under- standing performance of eBPF maps
Chang Liu, Byungchul Tak, and Long Wang. Under- standing performance of eBPF maps. In2nd Workshop on eBPF Kernel Extensions, eBPF, pages 9–15, 2024. 13
2024
-
[21]
Merlin: Multi-tier optimization of eBPF code for per- formance and compactness
Jinsong Mao, Hailun Ding, Juan Zhai, and Shiqing Ma. Merlin: Multi-tier optimization of eBPF code for per- formance and compactness. In29th International Con- ference on Architectural Support for Programming Lan- guages and Operating Systems, ASPLOS, 2024
2024
-
[22]
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christo- pher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dal- ton, Nandita Dukkipati, William C. Evans, Steve Grib- ble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. Snap...
2019
-
[23]
The BSD packet filter: A new architecture for user-level packet capture
Steven McCanne and Van Jacobson. The BSD packet filter: A new architecture for user-level packet capture. In1993 USENIX Winter Conference, USENIX, 1993
1993
-
[24]
Creating complex network services with eBPF: Experience and lessons learned
Sebastiano Miano, Matteo Bertrone, Fulvio Risso, Mas- simo Tumolo, and Mauricio Vásquez Bernal. Creating complex network services with eBPF: Experience and lessons learned. In19th IEEE International Conference on High Performance Switching and Routing, HPSR, 2018
2018
-
[25]
Fast in-kernel traffic sketching in eBPF
Sebastiano Miano, Xiaoqi Chen, Ran Ben Basat, and Gianni Antichi. Fast in-kernel traffic sketching in eBPF. SIGCOMM Computer Communication Review, 53(1):3– 13, January 2023
2023
-
[26]
Domain specific run time optimization for software data planes
Sebastiano Miano, Alireza Sanaee, Fulvio Risso, Gá- bor Rétvári, and Gianni Antichi. Domain specific run time optimization for software data planes. In27th International Conference on Architectural Support for Programming Languages and Operating Systems, ASP- LOS, 2022
2022
-
[27]
Homa: A receiver-driven low- latency transport protocol using network priorities
Behnam Montazeri, Yilong Li, Mohammad Alizadeh, , and John Ousterhout. Homa: A receiver-driven low- latency transport protocol using network priorities. In 2018 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2018
2018
-
[28]
NetKernel: Making network stack part of the virtualized infrastructure
Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and Keith Winstein. NetKernel: Making network stack part of the virtualized infrastructure. In2020 USENIX Annual Technical Conference, ATC, 2020
2020
-
[29]
MorphOS: An extensi- ble networked operating system
Peter Okelmann, Ilya Meignan-Masson, Masanori Mis- ono, and Pramod Bhatotia. MorphOS: An extensi- ble networked operating system. In21st International Conference on emerging Networking EXperiments and Technologie, CoNEXT ’25, 2025
2025
-
[30]
BOLT: A practical binary optimizer for data centers and beyond
Maksim Panchenko, Rafael Auler, Bill Nell, and Guil- herme Ottoni. BOLT: A practical binary optimizer for data centers and beyond. In2019 International Sympo- sium on Code Generation and Optimization, CGO, pages 2–14, 2019
2019
-
[31]
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The operating system is the control plane.ACM Transactions on Computer Systems, 33(4):11:1–11:30, November 2015
2015
-
[32]
https://www
QEMU – the FAST! processor emulator. https://www. qemu.org/
-
[33]
RosenBridge: A framework for enabling express I/O paths across the virtualization boundary
Shi Qiu, Li Wang, Jianqin Yan, Ruofan Xiong, Leping Yang, Xin Yao, Renhai Chen, Gong Zhang, Dongsheng Li, Jiwu Shu, and Yiming Zhang. RosenBridge: A framework for enabling express I/O paths across the virtualization boundary. In24th USENIX Conference on File and Storage Technologies, FAST, 2026
2026
-
[34]
https://github.com/qmonnet/ rbpf
qmonnet/rbpf: Rust virtual machine and JIT compiler for eBPF programs. https://github.com/qmonnet/ rbpf
-
[35]
Enabling bpf runtime policies for better bpf management
Raj Sahu and Dan Williams. Enabling bpf runtime policies for better bpf management. In1st Workshop on eBPF Kernel Extensions, eBPF, 2023
2023
-
[36]
A cloud-scale characterization of remote procedure calls
Korakit Seemakhupt, Brent Stephens, Samira Khan, Si- hang Liu, Hassan Wassel, Soheil Yeganeh Hassas, Alex Snoeren, Arvind Krishnamurthy, David Culler, and Henry Levy. A cloud-scale characterization of remote procedure calls. In29th ACM Symposium on Operating Systems Principles, SOSP, 2023
2023
-
[37]
Demystifying performance of eBPF network applications.Proceedings of the ACM on Networking, 3(CoNEXT3):1–21, September 2025
Farbod Shahinfar, Sebastiano Miano, Aurojit Panda, and Gianni Antichi. Demystifying performance of eBPF network applications.Proceedings of the ACM on Networking, 3(CoNEXT3):1–21, September 2025
2025
-
[38]
Tail Contagion: Sub-microsecond Time Protection in Shared Software Network Datapaths
Matheus Stolet, Liam Arzola, Simon Peter, and An- toine Kaufmann. Tail contagion: Sub-microsecond time protection in shared software network datap- aths.arXiv preprint arXiv:2309.14016, 2026. https: //arxiv.org/abs/2309.14016
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
https://github
iovisor/ubpf: Userspace eBPF VM. https://github. com/iovisor/ubpf
-
[40]
virtual function I/O
VFIO - "virtual function I/O". https://docs.kernel. org/driver-api/vfio.html
-
[41]
A tale of two paths: Optimizing paravirtualized storage I/O with eBPF.ACM Transactions on Storage, 22(1):6:1–6:24, 2026
Li Wang, Shi Qiu, Jianqin Yan, Zhirong Shen, Qingbo Wu, Xin Yao, Meiling Wang, Renhai Chen, and Yiming Zhang. A tale of two paths: Optimizing paravirtualized storage I/O with eBPF.ACM Transactions on Storage, 22(1):6:1–6:24, 2026. 14
2026
-
[42]
The worst-case execution-time problem—overview of meth- ods and survey of tools.ACM Transactions on Embedded Computing Systems, 7(3), May 2008
Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heck- mann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Pe- ter Puschner, Jan Staschulat, and Per Stenström. The worst-case execution-time problem—overview of meth- ods and survey of tools.ACM Transactions on Emb...
2008
-
[43]
Wong, Tanvi Wagle, Srinivas Narayana, and Anirudh Sivaraman
Qiongwen Xu, Michael D. Wong, Tanvi Wagle, Srinivas Narayana, and Anirudh Sivaraman. Synthesizing safe and efficient kernel extensions for packet processing. In2021 ACM SIGCOMM Conference on Data Communi- cation, SIGCOMM, 2021
2021
-
[44]
Navarro Leija, Ash- lie Martinez, Jing Liu, Anna Kornfeld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam
Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynyk, Jacob Nelson, Omar S. Navarro Leija, Ash- lie Martinez, Jing Liu, Anna Kornfeld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. The Demikernel dat- apath OS architecture for microsecond-scale datacenter systems. In28th ACM Symposium on Operating Syste...
2021
-
[45]
Ex- tending applications safely and efficiently
Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, Xiaozheng Lai, Dan Williams, and Andi Quinn. Ex- tending applications safely and efficiently. In18th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025
2025
-
[46]
Electrode: Accelerating distributed protocols with ebpf
Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu. Electrode: Accelerating distributed protocols with ebpf. In20th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2023. 15
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.