ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

Hengrui Zhang, Yulong Hui, Yihao Liu, Huanchen Zhang · 2025 · cs.DB · arXiv 2509.12610

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Predicates are foundational components in data analysis systems. However, modern workloads increasingly involve unstructured documents, which demands semantic understanding, beyond traditional value-based predicates. Given enormous documents and ad-hoc queries, while Large Language Models (LLMs) demonstrate powerful zero-shot capabilities, their high inference cost leads to unacceptable overhead. Therefore, we introduce \textsc{ScaleDoc}, a novel system that addresses this by decoupling predicate execution into an offline representation phase and an optimized online filtering phase. In the offline phase, \textsc{ScaleDoc} leverages a LLM to generate semantic representations for each document. Online, for each query, it trains a lightweight proxy model on these representations to filter the majority of documents, forwarding only the ambiguous cases to the LLM for final decision. Furthermore, \textsc{ScaleDoc} proposes two core innovations to achieve significant efficiency: (1) a contrastive-learning-based framework that trains the proxy model to generate reliable predicating decision scores; (2) an adaptive cascade mechanism that determines the effective filtering policy while meeting specific accuracy targets. Our evaluations across three datasets demonstrate that \textsc{ScaleDoc} achieves over a 2$\times$ end-to-end speedup and reduces expensive LLM invocations by up to 85\%, making large-scale semantic analysis practical and efficient.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

cs.DB · 2026-06-06 · unverdicted · novelty 7.0

An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.

PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

cs.DB · 2026-04-10 · conditional · novelty 7.0

PLOP is a cost-based optimizer that finds optimal placements for semantic LLM operators in hybrid query plans via dynamic programming, delivering up to 1.5x speedup and 4.29x cost reduction on 44 benchmark queries while preserving accuracy.

Larch: Learned Query Optimization for Semantic Predicates

cs.DB · 2026-06-06 · unverdicted · novelty 6.0

Larch uses a GNN-MDP formulation and a selectivity predictor plus dynamic programming to reorder semantic filter evaluation, cutting token usage 3x-19x versus prior systems on real and synthetic workloads.

Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization

cs.DC · 2026-04-22 · unverdicted · novelty 5.0

BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.

citing papers explorer

Showing 4 of 4 citing papers.

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method cs.DB · 2026-06-06 · unverdicted · none · ref 50 · internal anchor
An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.
PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans cs.DB · 2026-04-10 · conditional · none · ref 34 · internal anchor
PLOP is a cost-based optimizer that finds optimal placements for semantic LLM operators in hybrid query plans via dynamic programming, delivering up to 1.5x speedup and 4.29x cost reduction on 44 benchmark queries while preserving accuracy.
Larch: Learned Query Optimization for Semantic Predicates cs.DB · 2026-06-06 · unverdicted · none · ref 74 · internal anchor
Larch uses a GNN-MDP formulation and a selectivity predictor plus dynamic programming to reorder semantic filter evaluation, cutting token usage 3x-19x versus prior systems on real and synthetic workloads.
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization cs.DC · 2026-04-22 · unverdicted · none · ref 48 · internal anchor
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.

ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer