arxiv: 2605.01260 · v1 · submitted 2026-05-02 · 💻 cs.DB

Recognition: unknown

Write-Read Decoupling in Modern Large-Scale Search Engines: Architectures, Techniques, and Emerging Approaches

Minghui Zhu, Nan Wang, Qing Yang, Tianyu Ma, Wenjie Mao, Wenru Qiu, Xin Liang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:48 UTC · model grok-4.3

classification 💻 cs.DB

keywords search engine architectureswrite-read contentionindex updatescompute-storage separationper-field updatesreal-time indexinglucene-based systems

0 comments

The pith

Large-scale search engines decouple write operations from query latency through five architectural patterns including compute-storage separation and per-field update routing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys solutions to the core tension in modern search engines where frequent index updates for freshness create resource contention that slows down concurrent queries. It identifies five principal patterns developed across industry systems: node-level read-write separation, compute-storage separation, full in-memory indexing, log-structured write paths, and in-place partial updates. A sympathetic reader cares because these techniques allow real-time data visibility without trading off query performance in high-volume environments. The emerging ScaleSearch synthesis combines several patterns with dedicated per-field routing to handle scalar fields efficiently while routing full-text fields separately.

Core claim

The paper claims that write-read contention in Lucene-based engines stems from segment merges competing with queries for CPU, disk I/O, and page cache. It systematically examines five decoupling patterns across systems such as Elasticsearch, Vespa, and others, culminating in the ScaleSearch architecture that integrates compute-storage separation with full in-memory indexing and per-field update routing. Each field gets its own Kafka topic and update path so scalar fields update in-place in O(1) RAM with immediate visibility while full-text fields follow the segment-based path.

What carries the argument

Per-field update routing, which assigns each field its own Kafka topic and update path to separate scalar in-place updates from full-text segment-based processing.

If this is right

Compute-storage separation lets read and write workloads scale independently without shared resource contention.
In-place partial updates for scalar fields deliver immediate visibility without triggering full segment rebuilds.
Log-structured write paths reduce random I/O costs during high-frequency updates.
Full in-memory indexing removes disk contention entirely for both reads and writes.
The synthesis of these patterns supports hybrid vector and full-text retrieval with reduced latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The patterns could be applied to real-time analytics databases facing similar update-query trade-offs.
Serverless deployments mentioned as an open challenge might benefit from further decoupling of update routing from query serving.
AI-integrated search could require extensions of per-field routing to handle model parameter updates separately.

Load-bearing premise

That per-field update routing can be implemented at scale without introducing new contention, consistency overhead, or visibility delays across heterogeneous field types.

What would settle it

A controlled benchmark on a production-scale cluster that measures query latency, update throughput, and visibility delay for mixed scalar and full-text workloads with and without per-field routing enabled.

Figures

Figures reproduced from arXiv: 2605.01260 by Minghui Zhu, Nan Wang, Qing Yang, Tianyu Ma, Wenjie Mao, Wenru Qiu, Xin Liang.

**Figure 1.** Figure 1: illustrates the three main interference pathways on a shared Elasticsearch node. CPU contention. Segment merging is CPU-intensive: it re-encodes posting lists, sorts docID arrays, and rebuilds FSTbased term dictionaries. Merge threads compete with queryprocessing threads for CPU time, inflating tail latency. I/O contention and cache eviction. Merges generate large sequential I/O that displaces hot search… view at source ↗

**Figure 2.** Figure 2: Compute-storage separation. Indexing nodes write Lucene segments to shared object storage; stateless search nodes fetch and cache data on demand. The two tiers share no hardware resources and scale independently. roles [4]: dedicated ingest nodes pre-process documents; hottier nodes host write-intensive primaries; shard allocation filters pin read replicas to read-optimized nodes. Trade-offs. Propagation… view at source ↗

**Figure 3.** Figure 3: Log-structured write path (Milvus-style [27]). All writes are durably committed to the WAL. Growing Segments provide immediate brute-force searchability; Index Nodes asynchronously build ANN-optimized Sealed Segments persisted to object storage. Havenask / HA3 [28] (Alibaba’s production search engine for Taobao/Tmall) uses a Build Service operating in three modes: (i) full build produces a complete index p… view at source ↗

**Figure 4.** Figure 4: ScaleSearch architecture. Write Nodes build Lucene segments and upload them to object storage; Search Nodes load all segments into full in-memory indexes and poll for new segments. The two paths share no hardware resources and scale independently via the Master (etcd-backed). dependency. Subsequently, they poll object storage for new segment files; when new files appear, they are downloaded and merged in… view at source ↗

**Figure 5.** Figure 5: Per-field update routing in ScaleSearch. Fields are grouped into column families by freshness SLA. CF-realtime fields (left) are consumed directly by Search Nodes and written into a forward array in O(1) with instant visibility. CF-text fields (right) are consumed by a Write Node, flushed as Lucene segments to object storage, then merged into the Search Node’s in-memory inverted index. ScaleSearch introd… view at source ↗

read the original abstract

Large-scale search engines face a fundamental tension: the index must be updated frequently to maintain freshness, yet updates create resource contention that inflates query latency. In the dominant Lucene-based architecture, segment merges triggered by writes compete with concurrent queries for CPU cycles, disk I/O bandwidth, and operating-system page cache -- a problem we term \emph{write-read contention}. This survey systematically examines the architectural solutions that industry and academia have developed to decouple write pressure from read latency. We identify five principal patterns: (i)~node-level read-write separation; (ii)~compute-storage separation; (iii)~full in-memory indexing; (iv)~log-structured write paths; and (v)~in-place partial updates. We survey representative systems including Elasticsearch, LinkedIn Galene, Uber Sia, Quickwit, Alibaba Havenask, Algolia, Milvus, and Vespa, and discuss an emerging synthesis -- the ScaleSearch architecture -- that combines compute-storage separation with full in-memory indexing and dedicated write nodes. A key contribution of ScaleSearch is \emph{per-field update routing}: each field is assigned its own Kafka topic and update path, allowing scalar fields (price, stock, tags) to be updated in-place in $O(1)$ RAM with immediate visibility while full-text fields follow the segment-based compute-storage path. We conclude with open challenges in hybrid vector-and-full-text retrieval, serverless deployments, and AI-integrated search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A solid survey of decoupling patterns in search engines with a high-level proposal for per-field routing that still needs a consistency story and any validation.

read the letter

The paper's core is a survey of how large search systems manage write-read contention. It groups existing work into five patterns—node-level separation, compute-storage split, full in-memory indexes, log-structured paths, and in-place partial updates—and walks through real deployments like Elasticsearch, Galene, Sia, Quickwit, Havenask, Algolia, Milvus, and Vespa. That classification is useful and the citations are on point for anyone who needs a quick map of the space.

Referee Report

2 major / 1 minor

Summary. The manuscript surveys architectural solutions to write-read contention in large-scale search engines, classifying them into five patterns (node-level read-write separation, compute-storage separation, full in-memory indexing, log-structured write paths, and in-place partial updates). It reviews representative systems including Elasticsearch, LinkedIn Galene, Uber Sia, Quickwit, Alibaba Havenask, Algolia, Milvus, and Vespa, and proposes the ScaleSearch architecture as an emerging synthesis that combines compute-storage separation with per-field update routing via dedicated Kafka topics, enabling O(1) in-place scalar updates with immediate visibility while full-text fields follow segment-based paths.

Significance. If the proposed mechanisms can be realized, the systematic classification of patterns offers a valuable framework for analyzing trade-offs in search engine design, and the ScaleSearch synthesis highlights a promising direction for balancing freshness and latency. The paper appropriately credits prior systems and identifies open challenges in hybrid vector-text retrieval and serverless deployments.

major comments (2)

[Abstract] Abstract (ScaleSearch architecture description): the per-field update routing claim assigns each field its own Kafka topic to allow O(1) in-place scalar updates with immediate visibility, but provides no mechanism (e.g., global timestamps, cross-topic atomic commits, or query-time reconciliation) to ensure a query observes a single consistent document version when scalar and full-text fields are updated independently. This is load-bearing for the central synthesis claim, as it risks inconsistent visibility across heterogeneous field types in mixed documents.
[Abstract] Abstract (ScaleSearch architecture description): the asserted O(1) RAM complexity and immediate-visibility guarantee for scalar fields lacks any supporting analysis, pseudocode, or reference to contention avoidance in the in-memory structures, which is central to distinguishing the proposal from the surveyed in-place partial update pattern.

minor comments (1)

The five patterns would be clearer with a summary table comparing their impacts on latency, update freshness, resource contention, and scalability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our survey of write-read decoupling in search engines. The feedback identifies key areas where the ScaleSearch proposal requires additional detail to fully support its claims. We provide point-by-point responses below and commit to revisions that address these concerns.

read point-by-point responses

Referee: [Abstract] Abstract (ScaleSearch architecture description): the per-field update routing claim assigns each field its own Kafka topic to allow O(1) in-place scalar updates with immediate visibility, but provides no mechanism (e.g., global timestamps, cross-topic atomic commits, or query-time reconciliation) to ensure a query observes a single consistent document version when scalar and full-text fields are updated independently. This is load-bearing for the central synthesis claim, as it risks inconsistent visibility across heterogeneous field types in mixed documents.

Authors: We agree that the abstract's description of per-field update routing is concise and omits explicit consistency mechanisms. This is a valid observation. In the revised version, we will expand the discussion of the ScaleSearch architecture to include a consistency model based on per-document versioning. Each update, regardless of field type, will be associated with a global timestamp generated at the ingestion point. Scalar field updates will be applied in-place to an in-memory store with the new timestamp, while full-text updates proceed through the segment path with the same timestamp. At query time, the system will perform a lightweight reconciliation by consulting a version metadata service to select the most recent version for each document across the different storage paths. This approach uses query-time reconciliation rather than atomic commits across topics, trading a small amount of query overhead for consistency. We will add pseudocode and a diagram to illustrate this process, along with a discussion of its implications for latency and correctness. revision: yes
Referee: [Abstract] Abstract (ScaleSearch architecture description): the asserted O(1) RAM complexity and immediate-visibility guarantee for scalar fields lacks any supporting analysis, pseudocode, or reference to contention avoidance in the in-memory structures, which is central to distinguishing the proposal from the surveyed in-place partial update pattern.

Authors: We acknowledge that the abstract does not provide the requested analysis or pseudocode for the O(1) RAM claim. This point is well-taken, as it is important for differentiating ScaleSearch from existing in-place update approaches. In the revision, we will augment the ScaleSearch section with a short analysis showing that scalar fields are stored in dedicated in-memory hash tables on the write nodes, supporting constant-time updates and lookups with minimal contention due to the separation from the compute-intensive full-text indexing path. We will include pseudocode for the scalar update operation and reference standard techniques for contention avoidance, such as lock-free data structures or sharded maps. This will clarify how the architecture achieves immediate visibility without the merge-related contention described in the in-place partial updates pattern. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive survey plus high-level proposal with no derivations or self-referential reductions

full rationale

The manuscript is a survey identifying five architectural patterns from externally cited systems (Elasticsearch, Galene, Sia, Quickwit, Havenask, Algolia, Milvus, Vespa) and sketching an emerging synthesis called ScaleSearch. The per-field update routing description is presented as an architectural proposal, not as a derived prediction, fitted parameter, or result obtained by self-citation. No equations, uniqueness theorems, ansatzes, or load-bearing self-citations appear; the text contains no reduction of any claim to its own inputs by construction. This is the normal non-circular outcome for a survey-style architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claims rest on the classification of prior systems into five patterns and the feasibility of combining them with per-field routing; no free parameters, mathematical axioms, or independently evidenced new entities are introduced.

invented entities (1)

ScaleSearch architecture no independent evidence
purpose: Hybrid combining compute-storage separation, full in-memory indexing, dedicated write nodes, and per-field update routing
Presented as an emerging synthesis without shipped implementation, benchmarks, or external validation.

pith-pipeline@v0.9.0 · 5578 in / 1229 out tokens · 60037 ms · 2026-05-10T14:48:57.325297+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 1 canonical work pages

[1]

Białecki, R

A. Białecki, R. Muir, and G. Ingersoll. Apache Lucene 4. In Proc. SIGIR Workshop on Open Source IR (OSIR), 2012

2012
[2]

O’Neil, E

P. O’Neil, E. Cheng, D. Gawlick, and E. O’Neil. The Log-Structured Merge-Tree (LSM-Tree).Acta Informatica, 33(4):351–385, 1996

1996
[3]

M. A. Qader, C. V ogel, B. Dees, and J. Heiss. An Evaluation of LSM-tree for Full-Text Search. InProc. ACM SIGMOD, 2018

2018
[4]

Elastic.Elasticsearch Reference: Node Roles and Shard Allocation.https://www.elastic.co/guide/en/ elasticsearch/reference/current/modules-node. html
[5]

C. D. Manning, P. Raghavan, and H. Schütze.Introduction to Information Retrieval. Cambridge University Press, 2008

2008
[6]

D. R. Cutting and J. O. Pedersen. Optimizations for Dynamic Inverted Index Maintenance. InProc. ACM SIGIR, 1990

1990
[7]

Lester, A

N. Lester, A. Moffat, and J. Zobel. Fast Online Index Construc- tion by Geometric Partitioning. InProc. ACM CIKM, 2005

2005
[8]

Büttcher and C

S. Büttcher and C. L. A. Clarke. Indexing Time vs. Query Time: Trade-offs in Dynamic IR Systems. InProc. ACM CIKM, 2005

2005
[9]

https://cwiki.apache.org/confluence/display/ LUCENE/NearRealtimeSearch

Apache Lucene Project.Near Real-Time Search. https://cwiki.apache.org/confluence/display/ LUCENE/NearRealtimeSearch
[10]

McCandless

M. McCandless. Near-real-time latency during large merges. Blog post, 2011. 7

2011
[11]

How Many Shards Should I Have in My Elasticsearch Cluster? Blog post, 2018

Elastic. How Many Shards Should I Have in My Elasticsearch Cluster? Blog post, 2018

2018
[12]

Agrawal et al

S. Agrawal et al. Galene: Search at LinkedIn. LinkedIn Engi- neering Blog, 2016

2016
[13]

Uber’s Search Platform (Sia)

Uber Engineering. Uber’s Search Platform (Sia). Blog post, 2019

2019
[14]

Quickwit 101: Architecture of a Distributed Search Engine on Object Storage.https://quickwit.io/ blog/quickwit-101, 2023

Quickwit, Inc. Quickwit 101: Architecture of a Distributed Search Engine on Object Storage.https://quickwit.io/ blog/quickwit-101, 2023

2023
[15]

Durner, V

D. Durner, V . Leis, and T. Neumann. Exploiting Cloud Object Storage for High-Performance Analytics.Proc. VLDB Endow- ment, 16(11):2769–2782, 2023

2023
[16]

Serverless Elasticsearch / Search AI Lake

Elastic. Serverless Elasticsearch / Search AI Lake. Blog post, 2022–2024

2022
[17]

Psaroudakis et al

I. Psaroudakis et al. Serverless Elasticsearch: the Architec- ture Transformation from Stateful to Stateless. arXiv preprint, 2024

2024
[18]

Alibaba Cloud.OpenStore: Alibaba Cloud Elasticsearch Intel- ligent Hybrid Storage.https://www.alibabacloud.com/ help/doc-detail/284534.htm
[19]

Inside the Algolia En- gine Part 1: Indexing vs

Algolia Engineering. Inside the Algolia En- gine Part 1: Indexing vs. Search.https: //www.algolia.com/blog/engineering/ inside-the-algolia-engine-part-1-indexing-vs-search/
[20]

Scaling Indexing and Search—Algolia New Search Architecture Part 2

Algolia Engineering. Scaling Indexing and Search—Algolia New Search Architecture Part 2. High Scalability, 2024

2024
[21]

Typesense Project.Typesense Documentation.https:// typesense.org/docs/
[22]

Query Performance Improved 10×: Ximalaya Ad Inverted Index Design Practice

Ximalaya Engineering Team. Query Performance Improved 10×: Ximalaya Ad Inverted Index Design Practice. InfoQ, 2022

2022
[23]

Chambi, D

S. Chambi, D. Lemire, O. Kaser, and R. Godin. Better bitmap performance with Roaring bitmaps.Software: Practice and Experience, 46(5):709–719, 2016

2016
[24]

Lemire, G

D. Lemire, G. Ssi-Yan-Kai, and O. Kaser. Roaring bitmaps: Implementation of an optimized software library.Software: Practice and Experience, 48(4):867–895, 2018

2018
[25]

G. E. Pibiri and R. Venturini. Techniques for Inverted Index Compression.ACM Computing Surveys, 53(6):1–36, 2021

2021
[26]

Chang et al

F. Chang et al. Bigtable: A Distributed Storage System for Structured Data.ACM Trans. Comput. Syst., 26(2):1–26, 2008

2008
[27]

Wang et al

J. Wang et al. Milvus: A Purpose-Built Vector Data Manage- ment System. InProc. ACM SIGMOD, 2021

2021
[28]

Alibaba Havenask Team.Havenask: Open-Source Search En- gine.https://github.com/alibaba/havenask
[29]

Vespa Engineering.Vespa Documentation: Attributes.https: //docs.vespa.ai/en/attributes.html
[30]

Approximate Nearest Neigh- bor Search in Vespa.https://blog.vespa.ai/ approximate-nearest-neighbor-search-in-vespa-part-1/

Vespa Engineering. Approximate Nearest Neigh- bor Search in Vespa.https://blog.vespa.ai/ approximate-nearest-neighbor-search-in-vespa-part-1/
[31]

Advertising Retrieval Core Design

WanderingScorpion. Advertising Retrieval Core Design. CSDN Blog, 2021

2021
[32]

Pan et al

J. Pan et al. A Survey of Vector Database Management Sys- tems. arXiv:2310.14021, 2023

work page arXiv 2023
[33]

J. Lin, R. Nogueira, and A. Yates. Pretrained Transformers for Text Ranking: BERT and Beyond.Synthesis Lectures on Hu- man Language Technologies, 2021

2021
[34]

Zhang et al

Z. Zhang et al. Vexless: A Serverless Vector Data Manage- ment System Using Cloud Functions. InProc. ACM SIGMOD, 2024

2024
[35]

Ma et al

C. Ma et al. Sherman: A Write-Optimized Distributed B +Tree Index on Disaggregated Memory. InProc. ACM SIGMOD, 2022

2022
[36]

Tay et al

Y . Tay et al. Transformer Memory as a Differentiable Search Index. InProc. NeurIPS, 2022. 8

2022