Recognition: unknown
Should I Hide My Duck in the Lake?
read the original abstract
Data lakes spend a significant fraction of query execution time on scanning data from remote, disaggregated storage. Decoding alone accounts for 46% of runtime when running TPC-H directly on Parquet files. To address this bottleneck, we propose a vision for a data processing SmartNIC for the cloud that sits on the network datapath of compute nodes to offload decoding and pushed-down operators, effectively hiding the cost of parsing raw files. Our experimental estimations with DuckDB suggest that by operating directly on pre-filtered data, as delivered by a SmartNIC, we can significantly increase query processing performance and can still match query throughput of traditional setups with smaller, less expensive CPUs.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
SCENIC: Stream Computation-Enhanced SmartNIC
SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.