pith. machine review for the scientific record. sign in

arxiv: 2602.18775 · v2 · submitted 2026-02-21 · 💻 cs.DB

Recognition: unknown

Should I Hide My Duck in the Lake?

Authors on Pith no claims yet
classification 💻 cs.DB
keywords dataquerydecodingdirectlyfilesprocessingsmartnicaccounts
0
0 comments X
read the original abstract

Data lakes spend a significant fraction of query execution time on scanning data from remote, disaggregated storage. Decoding alone accounts for 46% of runtime when running TPC-H directly on Parquet files. To address this bottleneck, we propose a vision for a data processing SmartNIC for the cloud that sits on the network datapath of compute nodes to offload decoding and pushed-down operators, effectively hiding the cost of parsing raw files. Our experimental estimations with DuckDB suggest that by operating directly on pre-filtered data, as delivered by a SmartNIC, we can significantly increase query processing performance and can still match query throughput of traditional setups with smaller, less expensive CPUs.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SCENIC: Stream Computation-Enhanced SmartNIC

    cs.AR 2026-04 unverdicted novelty 7.0

    SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU...