pith. machine review for the scientific record. sign in

arxiv: 2605.02210 · v1 · submitted 2026-05-04 · 💻 cs.NI · cs.AR

Recognition: 2 theorem links

A Protocol-Independent Transport Architecture

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:41 UTC · model grok-4.3

classification 💻 cs.NI cs.AR
keywords protocol-independent transportdata-path programmabilityNIC hardware architectureTCPRoCEline-rate performanceFPGA implementationtransport layer abstraction
0
0 comments X

The pith

PITA gives NIC hardware a uniform data-path abstraction so multiple transport protocols run at line rate without protocol-specific hardware logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PITA to solve the rigidity of modern NIC hardware, where transport protocols are fixed in silicon and hard to change or extend. It replaces protocol-specific assumptions with one abstraction that covers events, state, and instructions, then redesigns scheduling, packet generation, and reassembly to work on that abstraction alone. The result is that the same hardware data path can host TCP and RoCE while keeping each protocol's distinct end-to-end behavior and still meeting line-rate throughput. This matters because it removes the current tradeoff between performance and evolvability in high-speed networks.

Core claim

PITA structures the data-path around a uniform abstraction over events, state, and instructions, and rethinks scheduling, packet generation, and data reassembly to operate on this abstraction. This design removes protocol-specific assumptions, allowing the same hardware substrate to implement diverse semantics such as TCP and RoCE while preserving their distinct end-to-end behaviors and sustaining line-rate performance with modest hardware overhead.

What carries the argument

The uniform abstraction over events, state, and instructions, which serves as the single substrate for rethought scheduling, packet generation, and data reassembly components so that protocol logic can be programmed without embedding protocol assumptions in hardware.

If this is right

  • Transport protocols can be updated or replaced in the data path by reprogramming rather than replacing NIC hardware.
  • The same physical NIC can host multiple protocols simultaneously while each retains its original congestion control and reliability semantics.
  • Hardware designers no longer need to embed protocol logic at design time, reducing the risk of locking in obsolete transport behavior.
  • Synthesis results show the architecture meets timing at 250 MHz on Alveo U250 cards with only modest resource overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Operators could deploy experimental protocols in production data centers without buying new NICs or accepting lower speed.
  • The approach might simplify multi-protocol environments where different applications need different transports on the same server fleet.
  • Future extensions could add support for in-network computing primitives while still keeping the protocol-independent property.

Load-bearing premise

A single uniform abstraction over events, state, and instructions plus the redesigned components can faithfully support the different requirements of protocols like TCP and RoCE without adding performance overhead or hidden protocol assumptions.

What would settle it

A concrete test would be to implement a third transport protocol on the PITA data path, run it at line rate on Alveo U250 hardware at 250 MHz, and check whether its end-to-end semantics remain intact and throughput does not drop compared with a protocol-specific implementation.

Figures

Figures reproduced from arXiv: 2605.02210 by Chris Neely, David Gao, Eric Su, Kimiya Mohammadtaheri, Mario Baldi, Matthew Chen, Mina Tahmasbi Arashloo, Nachiket Kapre, Pengyu Ji, Saad Syed, Samuel Zhang.

Figure 1
Figure 1. Figure 1: PITA’s architecture supports protocols with rad￾ically different semantics (§3) (green: fully programmable, purple: protocol-agnostic and reconfigurable) implicitly assumes that event effects on protocol state can be summarized using such simple associative updates. While this assumption holds for many TCP variants, it does not generalize to protocols in which events require more com￾plex state interaction… view at source ↗
Figure 2
Figure 2. Figure 2: PITA’s protocol-agnostic event scheduler (§4)(yel￾low: DP RAMs, blue: registers) TCP, which manages reliable transfer of a byte stream using a sliding window, RoCE supports message-oriented RDMA op￾erations between queue pairs and uses very different mecha￾nisms for loss detection and recovery and congestion control. Nevertheless, the same event-processing infrastructure and instruction-execution pipelines… view at source ↗
Figure 3
Figure 3. Figure 3: PITA’s protocol-agnostic packet generation (§6.1). (green: reprogrammable, yellow: DP RAMs, blue: registers) range of transport protocols without embedding protocol￾specific assumptions, these modules operate on generic in￾struction formats and configurable parameters rather than protocol-specific logic and provide efficient, reusable imple￾mentations of common transport operations. 6.1 Flexible Packet Gen… view at source ↗
Figure 4
Figure 4. Figure 4: Validating faithful realization of TCP and RoCEv2 in PITA by comparing their end-to-end behavior for a key￾value store application under induced congestion (§8.1). and 24KB (∼200 and 400 minimum-sized packets, respec￾tively). For RoCEv2, the ECN marking threshold for DCQCN is set to 3KB (∼ 50 minimum-sized packets). To induce con￾gestion, we temporarily reduce the queue drain rate to half the request gener… view at source ↗
Figure 5
Figure 5. Figure 5: PITA’s protocol-agnostic scheduler sustains line-rate throughput under realistic operating conditions (§8.2). While deep PLE pipelines (a) and extreme burstiness (b) can delay convergence from a cold start and increase intra-flow latency, they do not limit steady-state throughput, and the additional latency due to cross-flow contention (c) remains modest. (a) B2B instructions, mixed packet size (b) B2B ins… view at source ↗
Figure 6
Figure 6. Figure 6: PITA’s protocol-agnostic and instruction-driven packet generation and data reassembly sustains line rate under demanding conditions: back-to-back single-packet/segment instructions of various sizes. §8.2 discusses results and edge cases. pipeline depths of 12 and 7; view at source ↗
read the original abstract

The network transport layer is increasingly implemented in the NIC hardware to meet the performance demands of modern workloads, but this has made it difficult to evolve or deploy new transport protocols. Existing approaches either fix protocol logic in the data-path or build protocol-specific assumptions into the architecture that limit the range of protocols that can be supported on a single hardware substrate. We present PITA, a protocol-independent transport architecture that enables full data-path programmability while sustaining line-rate performance. PITA eliminates protocol-specific assumptions by structuring the data-path around a uniform abstraction over events, state, and instructions, and rethinks core components, including scheduling, packet generation, and data reassembly, to operate on this abstraction. We evaluate PITA along key dimensions reflecting the goals of its protocol-agnostic datapath design. Specifically, we show that PITA supports diverse protocol semantics by showing it can implement TCP and \roce on the same data path and preserve their distinct end-to-end behavior. Through targeted microbenchmarks and synthesis on Alveo U250 cards, we show that PITA's redesigned components sustain high performance under demanding conditions, with modest hardware overhead and meeting timing at 250MHz.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PITA, a protocol-independent transport architecture for NIC hardware that structures the data path around a uniform abstraction over events, state, and instructions. It rethinks scheduling, packet generation, and data reassembly to support diverse protocol semantics (explicitly TCP byte-stream reliability with congestion control and RoCE RDMA) on the same programmable substrate without protocol-specific assumptions, while sustaining line-rate performance. Evaluation consists of targeted microbenchmarks plus FPGA synthesis on Alveo U250 cards showing modest overhead and timing closure at 250 MHz, with the central demonstration that TCP and RoCE can be implemented on the shared path while preserving their distinct end-to-end behaviors.

Significance. If the core claims are substantiated, PITA would represent a meaningful advance in programmable networking hardware by decoupling transport protocol logic from fixed data-path assumptions, potentially enabling easier evolution and deployment of new protocols at line rate. The synthesis results and microbenchmark data provide concrete evidence of hardware feasibility, which is a strength for a systems paper in this area.

major comments (2)
  1. [Evaluation] Evaluation section (and abstract): The claim that TCP and RoCE can be implemented on the same data path while preserving their distinct end-to-end behaviors is load-bearing for the protocol-independence thesis, yet it is supported only by targeted microbenchmarks and synthesis results rather than full end-to-end protocol executions, throughput/latency traces against reference stacks, or workload-level verification that state machines and semantics remain unchanged under realistic traffic. This leaves open whether the uniform abstraction truly avoids introducing hidden protocol-specific assumptions or performance artifacts when both protocols run concurrently.
  2. [Architecture] Architecture description (uniform abstraction and rethought components): The paper asserts that the uniform abstraction over events/state/instructions plus redesigned scheduling, packet generation, and reassembly components support full TCP and RoCE semantics without protocol-specific assumptions, but the manuscript does not provide concrete mappings or invariants showing how TCP's byte-stream ordering and congestion control are realized alongside RoCE's RDMA operations on the shared path; without such detail the no-assumption claim cannot be fully assessed.
minor comments (1)
  1. [Abstract] Abstract: The LaTeX fragment “and preserve their distinct end-to-end behavior” contains a minor grammatical inconsistency with the plural subject; consider rephrasing for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting areas where the manuscript can be strengthened. We address each major comment below with clarifications on the existing evaluation and architecture, and we indicate revisions that will be incorporated.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (and abstract): The claim that TCP and RoCE can be implemented on the same data path while preserving their distinct end-to-end behaviors is load-bearing for the protocol-independence thesis, yet it is supported only by targeted microbenchmarks and synthesis results rather than full end-to-end protocol executions, throughput/latency traces against reference stacks, or workload-level verification that state machines and semantics remain unchanged under realistic traffic. This leaves open whether the uniform abstraction truly avoids introducing hidden protocol-specific assumptions or performance artifacts when both protocols run concurrently.

    Authors: We agree that comprehensive end-to-end traces and workload-level verification against reference stacks would provide stronger substantiation. The current microbenchmarks isolate and verify preservation of TCP byte-stream reliability, ordering, and congestion control alongside RoCE RDMA semantics under concurrent operation on the shared path, confirming no interference in state machines. These tests exercise the uniform abstraction directly. We will revise the evaluation section to include expanded benchmark descriptions, additional concurrent execution results, and explicit discussion of how the results confirm unchanged end-to-end behaviors. revision: partial

  2. Referee: [Architecture] Architecture description (uniform abstraction and rethought components): The paper asserts that the uniform abstraction over events/state/instructions plus redesigned scheduling, packet generation, and reassembly components support full TCP and RoCE semantics without protocol-specific assumptions, but the manuscript does not provide concrete mappings or invariants showing how TCP's byte-stream ordering and congestion control are realized alongside RoCE's RDMA operations on the shared path; without such detail the no-assumption claim cannot be fully assessed.

    Authors: The architecture section explains the uniform abstraction and how scheduling, packet generation, and reassembly operate on events/state/instructions to support both protocols. To improve clarity, we will add concrete mappings (e.g., TCP byte-stream ordering via sequenced events and separate congestion state; RoCE RDMA via direct memory operations) and explicit invariants ensuring the shared path introduces no protocol-specific assumptions. These additions will be placed in the architecture description. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture presentation with independent evaluation

full rationale

The paper introduces PITA as a hardware datapath design and evaluates it via direct implementation of TCP and RoCE on the shared abstraction, plus microbenchmarks and Alveo U250 synthesis reporting timing, overhead, and performance numbers. No equations, fitted parameters, or predictions appear in the provided text. Claims of protocol support and line-rate behavior rest on concrete implementation and external hardware measurements rather than any reduction to self-defined inputs, self-citations, or renamed known results. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Only the abstract is available, so the ledger reflects high-level claims. The work introduces new hardware concepts whose details and assumptions are not fully specified.

axioms (1)
  • domain assumption Hardware synthesis on Alveo U250 cards can meet timing at 250 MHz with the redesigned components
    Stated as part of the evaluation results in the abstract.
invented entities (2)
  • Uniform abstraction over events, state, and instructions no independent evidence
    purpose: To eliminate protocol-specific assumptions from the data path
    Core structuring principle introduced to enable protocol independence.
  • PITA architecture no independent evidence
    purpose: To provide full data-path programmability at line rate for diverse protocols
    The overall proposed system.

pith-pipeline@v0.9.0 · 5536 in / 1367 out tokens · 61861 ms · 2026-05-08T18:41:27.137943+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 9 canonical work pages

  1. [1]

    Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Pad- hye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center tcp (dctcp). InProceedings of the ACM SIGCOMM 2010 Conference. 63–74

  2. [2]

    Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. 2020. Enabling Pro- grammable Transport Protocols in High-Speed NICs. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 93–109.https://www. usenix.org/conference/nsdi20/presentation...

  3. [3]

    Serhat Arslan, Stephen Ibanez, Alex Mallery, Changhoon Kim, and Nick McKeown. 2021. NanoTransport: A Low-Latency, Programmable Transport Layer for NICs. InProceedings of the ACM SIGCOMM Sym- posium on SDN Research (SOSR) (SOSR ’21). Association for Comput- ing Machinery, New York, NY, USA, 13–26.https://doi.org/10.1145/ 3482898.3483365

  4. [4]

    Junehyuk Boo, Yujin Chung, Eunjin Baek, Seongmin Na, Changsu Kim, and Jangwoo Kim. 2023. F4T: A Fast and Flexible FPGA-based Full-stack TCP Acceleration Framework. InProceedings of the 50th Annual International Symposium on Computer Architecture (ISCA ’23). Association for Computing Machinery, New York, NY, USA, Article 55, 13 pages.https://doi.org/10.114...

  5. [5]

    Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McK- eown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding metamorphosis: fast programmable match-action process- ing in hardware for SDN. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM ’13). Association for Comput- ing Machinery, New York, NY, USA, 99–1...

  6. [6]

    Qizhe Cai, Mina Tahmasbi Arashloo, and Rachit Agarwal. 2022. dcPIM: Near-optimal proactive datacenter transport. InProceedings of the ACM SIGCOMM 2022 Conference. 53–65

  7. [7]

    Vitis Developers. [n. d.]. AMD Vitis HLS. https://www.amd.com/en/products/software/adaptive-socs-and- fpgas/vitis/vitis-hls.html. ([n. d.]). Accessed: January 2025

  8. [8]

    Vivado Developers. [n. d.]. AMD Vivado™Design Suite. https://www.amd.com/en/products/software/adaptive-socs-and- fpgas/vivado.html. ([n. d.]). Accessed: January 2025

  9. [9]

    Maltz, and Albert Greenberg

    Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, M...

  10. [10]

    In15th USENIX Symposium on Networked Systems Design and Im- plementation (NSDI 18)

    Azure Accelerated Networking: SmartNICs in the Public Cloud. In15th USENIX Symposium on Networked Systems Design and Im- plementation (NSDI 18). USENIX Association, Renton, WA, 51–66. https://www.usenix.org/conference/nsdi18/presentation/firestone

  11. [11]

    Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. pHost: Distributed near-optimal datacenter transport over commodity network fabric. InProceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies. 1–12

  12. [12]

    Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over commodity ethernet at scale. InProceedings of the 2016 ACM SIGCOMM Conference. 202–215

  13. [13]

    Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP- friendly high-speed TCP variant.ACM SIGOPS operating systems review42, 5 (2008), 64–74

  14. [14]

    Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W Moore, Gianni Antichi, and Marcin Wójcik. 2017. Re- architecting datacenter networks and stacks for low latency and high performance. InProceedings of the Conference of the ACM Special Interest Group on Data Communication. 29–42

  15. [15]

    Zhenhao He, Dario Korolija, and Gustavo Alonso. 2021. EasyNet: 100 Gbps Network for HLS. In2021 31st International Conference on Field-Programmable Logic and Applications (FPL). 197–203.https://doi. org/10.1109/FPL53798.2021.00040

  16. [16]

    2025.UEC 1.0: New High-Performance Stan- dard for Scaling HPC-AI

    Intersect360 Research. 2025.UEC 1.0: New High-Performance Stan- dard for Scaling HPC-AI. White Paper. Ultra Ethernet Consor- tium.https://ultraethernet.org/wp-content/uploads/sites/20/2025/ 06/UEC1.0Whitepaper.pdf

  17. [17]

    Cheng Jin, David X Wei, and Steven H Low. 2004. FAST TCP: motiva- tion, architecture, algorithms, performance. InIEEE INFOCOM 2004, Vol. 4. IEEE, 2490–2501

  18. [18]

    Yanfang Le, Rong Pan, Peter Newman, Jeremias Blendin, Abdul Kab- bani, Vipin Jain, Raghava Sivaramu, and Francis Matus. 2024. Strack: A reliable multipath transport for ai/ml clusters.arXiv preprint arXiv:2407.15266(2024)

  19. [19]

    Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting network support for RDMA. InProceedings of the 2018 Con- ference of the ACM Special Interest Group on Data Communication (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 313–326.https://doi.org/1...

  20. [20]

    Pedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, and Mina Tahmasbi Arashloo. 2026. A Target-Agnostic Protocol-Independent Interface for the Transport Layer. (2026). arXiv:cs.NI/2509.21550https://arxiv.org/abs/2509.21550

  21. [21]

    Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ouster- hout. 2018. Homa: A receiver-driven low-latency transport protocol using network priorities. InProceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 221–235

  22. [22]

    YoungGyoun Moon, SeungEon Lee, Muhammad Asim Jamshed, and KyoungSoo Park. 2020. AccelTCP: Accelerating Network Applications 12 with Stateful TCP Offloading. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 77–92.https://www.usenix.org/conference/nsdi20/ presentation/moon

  23. [23]

    Akshay Narayan, Frank Cangialosi, Deepti Raghavan, Prateesh Goyal, Srinivas Narayana, Radhika Mittal, Mohammad Alizadeh, and Hari Balakrishnan. 2018. Restructuring endpoint congestion control. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’18). Association for Computing Ma- chinery, New York, NY, US...

  24. [24]

    Mario Ruiz, David Sidler, Gustavo Sutter, Gustavo Alonso, and Ser- gio López-Buedo. 2019. Limago: An FPGA-Based Open-Source 100 GbE TCP/IP Stack. In2019 29th International Conference on Field Pro- grammable Logic and Applications (FPL). 286–292.https://doi.org/10. 1109/FPL.2019.00053

  25. [25]

    Leah Shalev, Hani Ayoub, Nafea Bshara, and Erez Sabbag. 2020. A cloud-optimized transport protocol for elastic and scalable hpc.IEEE micro40, 6 (2020), 67–73

  26. [26]

    Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter

  27. [27]

    In19th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 22)

    FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In19th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 22). USENIX Association, Renton, WA, 87–102.https: //www.usenix.org/conference/nsdi22/presentation/shashidhara

  28. [28]

    Arjun Singhvi, Nandita Dukkipati, Prashant Chandra, Hassan MG Wassel, Naveen Kr Sharma, Anthony Rebello, Henry Schuh, Praveen Kumar, Behnam Montazeri, Neelesh Bansod, et al . 2025. Falcon: A reliable, low latency hardware transport. InProceedings of the ACM SIGCOMM 2025 Conference. 248–263

  29. [29]

    Kun Tan, Jingmin Song, Qian Zhang, and Murad Sridharan. 2006. A compound TCP approach for high-speed and long distance networks. InProceedings-IEEE INFOCOM

  30. [30]

    Balajee Vamanan, Jahangir Hasan, and TN Vijaykumar. 2012. Deadline- aware datacenter tcp (d2tcp).ACM SIGCOMM Computer Communica- tion Review42, 4 (2012), 115–126. A PITA Parameters Table 3 shows the relevant PITA parameters. Module Parameter Value Global flow count ED event width 64 b (TCP) event type count 4 (TCP) context width 938 b (TCP) serialized dat...