Recognition: 2 theorem links
· Lean TheoremShould I Hide My Duck in the Lake?
Pith reviewed 2026-05-15 20:46 UTC · model grok-4.3
The pith
A SmartNIC on the network path can offload Parquet decoding and filtering to raise data lake query speeds while allowing cheaper CPUs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By positioning a data processing SmartNIC on the network datapath, decoding and operator pushdown happen before data arrives at the host, so queries operate directly on pre-filtered results. This hides the cost of parsing raw files and allows the same query throughput with smaller, less expensive CPUs.
What carries the argument
A data processing SmartNIC that performs decoding and pushed-down operators on the network datapath to deliver pre-filtered data to the host.
If this is right
- Query processing nodes can use smaller CPUs while keeping the same throughput.
- System cost drops because less expensive hardware suffices for the same workload.
- The scanning and decoding bottleneck in disaggregated storage is reduced.
- Queries spend less time waiting on remote file access.
Where Pith is reading between the lines
- The design could be added to existing cloud networks with minimal changes to query engines like DuckDB.
- Extending the same offload logic to other file formats would broaden the approach beyond Parquet.
- The work points to a tighter coupling between network hardware and data processing that future systems may adopt.
Load-bearing premise
A practical SmartNIC can decode files and push down operators at full line rate without adding latency, power draw, or integration problems that erase the gains.
What would settle it
A working SmartNIC prototype that decodes Parquet at network line rate but causes end-to-end query latency to rise above current CPU-only baselines would disprove the performance benefit.
Figures
read the original abstract
Data lakes spend a significant fraction of query execution time on scanning data from remote, disaggregated storage. Decoding alone accounts for 46% of runtime when running TPC-H directly on Parquet files. To address this bottleneck, we propose a vision for a data processing SmartNIC for the cloud that sits on the network datapath of compute nodes to offload decoding and pushed-down operators, effectively hiding the cost of parsing raw files. Our experimental estimations with DuckDB suggest that by operating directly on pre-filtered data, as delivered by a SmartNIC, we can significantly increase query processing performance and can still match query throughput of traditional setups with smaller, less expensive CPUs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a vision for a cloud data-processing SmartNIC placed on the network datapath of compute nodes. It claims that offloading Parquet decoding and pushed-down operators hides scanning costs in disaggregated data lakes; DuckDB estimations are cited to show that pre-filtered data delivery yields significantly higher query performance while still matching traditional throughput on smaller, less expensive CPUs. Decoding is asserted to consume 46% of TPC-H runtime on Parquet.
Significance. If a SmartNIC meeting the stated line-rate and integration assumptions can be built, the approach would materially lower CPU provisioning costs for cloud analytics workloads that currently spend substantial time on remote file parsing.
major comments (2)
- [Abstract] Abstract: the central performance claim rests on DuckDB estimations whose methodology, workload details (queries, scale factors, Parquet configurations), measurement method, and error margins are not described, preventing assessment of the reported 46% decoding overhead or the projected net gains.
- [Vision Proposal] Vision section: the assumption that decoding plus operator pushdown can be performed at line rate on the network datapath without offsetting PCIe handoff, memory-coherence, or sustained-parsing latency/power costs is stated without hardware model, prototype data, or sensitivity analysis; if any of these costs exceed the modeled savings, the headline claim does not hold.
minor comments (1)
- The manuscript would benefit from an explicit limitations subsection that enumerates the hardware assumptions required for the SmartNIC to deliver the claimed benefits.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our vision paper. We address each major comment below and will revise the manuscript to improve clarity and completeness while preserving its visionary nature.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim rests on DuckDB estimations whose methodology, workload details (queries, scale factors, Parquet configurations), measurement method, and error margins are not described, preventing assessment of the reported 46% decoding overhead or the projected net gains.
Authors: We agree that the DuckDB estimation methodology requires more detail to allow proper evaluation. In the revised manuscript we will expand both the abstract and the main text with a dedicated subsection (or appendix) that specifies the TPC-H queries and scale factors used, Parquet file configurations and compression settings, the exact measurement procedure within DuckDB, and any error margins or simplifying assumptions applied to the 46% decoding overhead figure. revision: yes
-
Referee: [Vision Proposal] Vision section: the assumption that decoding plus operator pushdown can be performed at line rate on the network datapath without offsetting PCIe handoff, memory-coherence, or sustained-parsing latency/power costs is stated without hardware model, prototype data, or sensitivity analysis; if any of these costs exceed the modeled savings, the headline claim does not hold.
Authors: As this is a vision paper, we do not possess a hardware prototype or detailed RTL-level model. We will nevertheless revise the Vision section to explicitly list the key hardware assumptions (line-rate parsing, PCIe transfer costs, memory coherence overheads, and power budgets), discuss how these costs could offset savings, and include a simple sensitivity analysis that varies the relative cost of offload versus host processing to show the conditions under which the proposed benefits remain valid. revision: partial
- Empirical prototype data or a concrete hardware implementation of the proposed SmartNIC, which does not yet exist because the work is a forward-looking vision rather than an implementation study.
Circularity Check
No circularity: vision paper relies on external DuckDB estimations without self-referential derivation
full rationale
The manuscript presents a forward-looking vision for SmartNIC offload of decoding and operators, supported by DuckDB-based experimental estimations on TPC-H Parquet workloads. No equations, derivations, fitted parameters, or first-principles results are claimed. The performance suggestions (e.g., matching throughput on smaller CPUs via pre-filtered data) are presented as empirical observations from external tooling rather than quantities that reduce to the paper's own inputs by construction. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing steps. The argument therefore remains self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Decoding accounts for 46% of TPC-H runtime on Parquet files
Forward citations
Cited by 1 Pith paper
-
SCENIC: Stream Computation-Enhanced SmartNIC
SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU...
Reference graph
Works this paper leans on
-
[1]
Azim Afroozeh and Peter Boncz. 2025. The FastLanes File Format.Proc. VLDB Endow.18, 11 (2025), 4629–4643. doi:10.14778/3749646.3749718
-
[2]
Apache Software Foundation. 2025. Apache Parquet Format Specification. https: //parquet.apache.org/. Accessed: 2026-04-30
work page 2025
-
[3]
Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger, Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas, Rahul Pathak, Orestis Polychro- niou, Foyzur Rahman, Gaurav Saxena, Gokul Soundara...
-
[4]
Mengchu Cai, Martin Grund, Anurag Gupta, Fabian Nagel, Ippokratis Pandis, Yannis Papakonstantinou, and Michalis Petropoulos. 2018. Integrated Querying of SQL database data and S3 data in Amazon Redshift.IEEE Data Eng. Bull.41, 2 (2018), 82–90
work page 2018
-
[5]
Jonas Dann, Daniel Ritter, and Holger Fröning. 2023. Non-relational Databases on FPGAs: Survey, Design Decisions, Challenges.ACM Comput. Surv.55, 11 (2023), 225:1–225:37. doi:10.1145/3568990
-
[6]
Jonas Dann, Royden Wagner, Daniel Ritter, Christian Faerber, and Holger Fröning
-
[7]
PipeJSON: Parsing JSON at Line Speed on FPGAs. InDaMoN. ACM, 3:1–3:7. doi:10.1145/3533737.3535094
-
[8]
Daniel Firestone, Andrew Putnam, Sambrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian M. Caulfield, Eric S. Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Sil...
work page 2018
-
[9]
Mateusz Gienieczko, Maximilian Kuschewski, Thomas Neumann, Viktor Leis, and Jana Giceva. 2025. AnyBlox: A Framework for Self-Decoding Datasets.Proc. VLDB Endow.18, 11 (2025), 4017–4031. doi:10.14778/3749646.3749672
- [10]
-
[11]
Maximilian Jakob Heer, Benjamin Ramhorst, Yu Zhu, Luhao Liu, Zhiyi Hu, Jonas Dann, and Gustavo Alonso. 2025. RoCE BALBOA: Service-enhanced Data Center RDMA for SmartNICs.CoRRabs/2507.20412 (2025). doi:10.48550/ARXIV.2507. 20412
-
[12]
Bernstein, Jialin Li, and Qizhen Zhang
Jason Hu, Philip A. Bernstein, Jialin Li, and Qizhen Zhang. 2025. DPDPU: Data Processing with DPUs. InCIDR. www.cidrdb.org
work page 2025
-
[13]
Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D
Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A High-Performance Database System Leveraging In-Storage Computing.Proc. VLDB Endow.9, 12 (2016), 924–
work page 2016
-
[14]
doi:10.14778/2994509.2994512
-
[15]
Marko Kabic, Bowen Wu, Jonas Dann, and Gustavo Alonso. 2025. Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs.Proc. VLDB Endow.18, 11 (2025), 4350–4363. doi:10.14778/3749646.3749698
-
[16]
Kfoury, Samia Choueiri, Ali Mazloum, Ali AlSabeh, Jose Gomez, and Jorge Crichigno
Elie F. Kfoury, Samia Choueiri, Ali Mazloum, Ali AlSabeh, Jose Gomez, and Jorge Crichigno. 2024. A Comprehensive Survey on SmartNICs: Architectures, Development Models, Applications, and Research Directions.IEEE Access12 (2024), 107297–107336. doi:10.1109/ACCESS.2024.3437203
-
[17]
Dario Korolija, Dimitrios Koutsoukos, Kimberly Keeton, Konstantin Taranov, Dejan S. Milojicic, and Gustavo Alonso. 2022. Farview: Disaggregated Memory with Operator Off-loading for Database Engines. InCIDR. www.cidrdb.org
work page 2022
-
[18]
Maximilian Kuschewski, Jana Giceva, Thomas Neumann, and Viktor Leis. 2024. High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance.Proc. ACM Manag. Data2, 6 (2024), 238:1–238:27. doi:10.1145/ 3698813
work page 2024
-
[19]
Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis
-
[20]
BtrBlocks: Efficient Columnar Compression for Data Lakes.Proc. ACM Manag. Data1, 2 (2023), 118:1–118:26. doi:10.1145/3589263
-
[21]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shiv- akumar, Matt Tolton, Theo Vassilakis, Hossein Ahmadi, Dan Delorey, Slava Min, Mosha Pasumansky, and Jeff Shute. 2020. Dremel: A Decade of Interac- tive SQL Analysis at Web Scale.Proc. VLDB Endow.13, 12 (2020), 3461–3472. doi:10.14778/3415478.3415568
-
[22]
2012.A Technical Overview of the Oracle Exadata Data- base Machine and Exadata Storage Server
Oracle Corporation. 2012.A Technical Overview of the Oracle Exadata Data- base Machine and Exadata Storage Server. White Paper. Oracle Corpora- tion. https://www.oracle.com/technetwork/server-storage/engineered-systems/ exadata/dbmachine-x3-twp-1867467.pdf
work page 2012
-
[23]
Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. 2017. Centaur: A Framework for Hybrid CPU-FPGA Databases. InFCCM. IEEE Computer Society, 211–218. doi:10.1109/FCCM.2017.37
- [24]
-
[25]
Johan Peltenburg, Ákos Hadnagy, Matthijs Brobbel, Robert Morrow, and Zaid Al-Ars. 2021. Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators. InFPT. IEEE, 1–9. doi:10.1109/ICFPT52863.2021.9609833
-
[26]
Johan Peltenburg, Lars T. J. van Leeuwen, Joost Hoozemans, Jian Fang, Zaid Al-Ars, and H. Peter Hofstee. 2020. Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA. InFPT. IEEE, 281–286. doi:10.1109/ ICFPT51103.2020.00048
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[27]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InSIGMOD. ACM, 1981–1984. doi:10.1145/3299869.3320212
-
[28]
Benjamin Ramhorst, Maximilian Jakob Heer, Luhao Liu, Heejae Kim, Jonas Dann, Jin-Soo Kim, and Gustavo Alonso. 2026. SCENIC: Stream Computation-Enhanced SmartNIC.CoRRabs/2604.15128 (2026). doi:10.48550/arXiv.2604.15128
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.15128 2026
-
[29]
Benjamin Ramhorst, Dario Korolija, Maximilian Jakob Heer, Jonas Dann, Luhao Liu, and Gustavo Alonso. 2025. Coyote v2: Raising the Level of Abstraction for Data Center FPGAs. InSOSP. ACM, 639–654. doi:10.1145/3731569.3764845
-
[30]
Jan Vincent Szlang, Sebastian Breß, Sebastian Cattes, Jonathan Dees, Florian Funke, Max Heimel, Michel Oleynik, Ismail Oukid, and Tobias Maltenberger
-
[31]
VLDB Endow.18, 12 (2025), 5126–5138
Workload Insights From the Snowflake Data Cloud: What Do Production Analytic Queries Really Look Like?Proc. VLDB Endow.18, 12 (2025), 5126–5138. doi:10.14778/3750601.3750632
-
[32]
Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet.Proc. VLDB Endow.17, 11 (2024), 3694–3706. doi:10.14778/3681954.3682031
-
[33]
Alexander van Renen and Viktor Leis. 2023. Cloud Analytics Benchmark.Proc. VLDB Endow.16, 6 (2023), 1413–1425. doi:10.14778/3583140.3583156
-
[34]
Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building An Elastic Query Engine on Disaggregated Storage. InNSDI. USENIX Association, 449–462
work page 2020
-
[35]
Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajin Tang, Gang Pan, Fei Wu, Bingsheng He, and Gustavo Alonso
-
[36]
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics.CoRRabs/2503.09318 (2025). doi:10.48550/ARXIV.2503.09318
-
[37]
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading.Proc. VLDB Endow. 7, 11 (2014), 963–974. doi:10.14778/2732967.2732972
-
[38]
Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, and Michael Stone- braker. 2024. FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs.VLDB J.33, 5 (2024), 1643–1670. doi:10.1007/s00778-024-00867-8
-
[39]
Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker
Xiangyao Yu, Matt Youill, Matthew E. Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. 2020. PushdownDB: Accel- erating a DBMS Using S3 Computation. InICDE. IEEE, 1802–1805. doi:10.1109/ ICDE48307.2020.00174
-
[40]
Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, and Huanchen Zhang. 2023. An Empirical Evaluation of Columnar Storage Formats. Proc. VLDB Endow.17, 2 (2023), 148–161. doi:10.14778/3626292.3626298
-
[41]
Andreas Zimmerer, Damien Dam, Jan Kossmann, Juliane Waack, Ismail Oukid, and Andreas Kipf. 2025. Pruning in Snowflake: Working Smarter, Not Harder. In SIGMOD. ACM, 757–770. doi:10.1145/3722212.3724447
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.