RISE: A Rust Library for Inverted Index Search Engines
Pith reviewed 2026-06-27 20:49 UTC · model grok-4.3
The pith
RISE is a Rust library for inverted indexes that matches or exceeds existing tools with up to 2x query speedups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present RISE, a novel inverted index library implemented in Rust, designed to deliver high performance and efficiency for information retrieval tasks. RISE leverages Rust's safety and performance to provide a robust solution for building and querying inverted indexes, while offering accessible extensibility through its expressive trait system. While developing RISE, we revisited the inverted-index literature, thereby reproducing numerous prior works using this new test bench. We evaluated RISE against existing libraries, demonstrating competitive query performance across various datasets and workloads, with speedups of up to 2x over the current state of the art.
What carries the argument
The RISE library, which implements inverted-index data structures and query algorithms in Rust and exposes them through traits for extensibility and customization.
If this is right
- Prior inverted-index algorithms can be reproduced and compared inside a single safe, high-performance code base.
- Search-engine implementers gain a new option that can reduce query latency on standard hardware.
- The trait-based design allows new index variants or query operators to be added without modifying core components.
Where Pith is reading between the lines
- A common Rust library could become a shared test bed that makes it easier for researchers to compare new index techniques against one another.
- Production systems that adopt RISE might see lower server costs or higher query throughput if the reported speedups hold under their workloads.
- The library's emphasis on extensibility opens a route for adding support for specialized data types such as numeric fields or geo-locations in future extensions.
Load-bearing premise
The benchmarks and datasets used fairly represent real-world workloads and the other libraries were configured and measured under equivalent conditions.
What would settle it
A head-to-head test on the same datasets and query workloads where RISE is consistently slower than the fastest existing library by more than a small constant factor.
Figures
read the original abstract
Inverted indexes are a crucial data structure for efficient information retrieval in large text corpora. They enable fast full-text search by mapping each term to the documents in which it appears, on top of which efficient algorithms quickly retrieve the documents relevant to a user query. We present RISE, a novel inverted index library implemented in Rust, designed to deliver high performance and efficiency for information retrieval tasks. RISE leverages Rust's safety and performance to provide a robust solution for building and querying inverted indexes, while offering accessible extensibility through its expressive trait system. While developing RISE, we revisited the inverted-index literature, thereby reproducing numerous prior works using this new test bench. We evaluated RISE against existing libraries, demonstrating competitive query performance across various datasets and workloads, with speedups of up to 2x over the current state of the art. Our results indicate that RISE is a promising tool for researchers and practitioners in the field of information retrieval.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents RISE, a Rust library for inverted indexes that leverages Rust's safety guarantees and trait system for extensibility. It reports reproducing multiple prior inverted-index algorithms as a test bench and claims competitive query performance with speedups of up to 2x over existing libraries across various datasets and workloads.
Significance. A well-documented, reproducible Rust implementation of inverted-index primitives could serve as a useful reference implementation for the IR community, particularly if the performance claims are supported by transparent benchmarks. The reproduction of prior work is a positive contribution, but the absence of experimental details prevents any assessment of whether the reported speedups represent a genuine advance.
major comments (2)
- [Abstract] Abstract (final paragraph): the central performance claim ('speedups of up to 2x over the current state of the art') is stated without any accompanying description of datasets, query workloads, baseline library versions and configurations, hardware, measurement methodology, or statistical reporting. This renders the claim unverifiable from the manuscript.
- [Evaluation] Evaluation section (implied by the abstract's reference to 'various datasets and workloads'): no tables, figures, or text provide the concrete numbers, error bars, or configuration details needed to evaluate the 'competitive query performance' assertion or to reproduce the experiments.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for greater experimental transparency. We agree that the performance claims require supporting details on datasets, workloads, baselines, hardware, and methodology to be verifiable and reproducible. We will revise the manuscript to address these points fully.
read point-by-point responses
-
Referee: [Abstract] Abstract (final paragraph): the central performance claim ('speedups of up to 2x over the current state of the art') is stated without any accompanying description of datasets, query workloads, baseline library versions and configurations, hardware, measurement methodology, or statistical reporting. This renders the claim unverifiable from the manuscript.
Authors: We agree that the abstract's performance claim is presented without sufficient context. In the revised manuscript we will expand the abstract to include a brief but explicit description of the evaluation setup (standard datasets, query workloads, baseline library versions and configurations, hardware platform, measurement methodology, and statistical reporting). Full details will remain in the evaluation section. revision: yes
-
Referee: [Evaluation] Evaluation section (implied by the abstract's reference to 'various datasets and workloads'): no tables, figures, or text provide the concrete numbers, error bars, or configuration details needed to evaluate the 'competitive query performance' assertion or to reproduce the experiments.
Authors: We concur that the evaluation section currently lacks the required tables, figures, concrete performance numbers, error bars, and configuration details. The revised version will add comprehensive tables and figures reporting query times with appropriate statistical measures, together with exhaustive configuration information for all baselines, hardware, and experimental methodology to support both assessment and reproduction. revision: yes
Circularity Check
No significant circularity; purely empirical library evaluation
full rationale
The paper presents an implementation of an inverted-index library in Rust together with direct empirical benchmarks against other libraries. No mathematical derivations, first-principles predictions, fitted parameters, or uniqueness theorems appear in the abstract or the described content. The central claim (competitive query performance with up to 2x speedups) is a measured outcome on stated datasets and workloads, not a quantity derived from prior results by the paper's own equations or self-citations. Consequently no load-bearing step reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Apache Software Foundation. 2026. Apache Lucene. https://github.com/apache/ lucene. Accessed: May 2026
2026
-
[2]
Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien
Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM). 426–434. doi:10.1145/956863.956944
-
[3]
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini
-
[4]
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 152–162. doi:10. 1145/3626772.3657769
-
[5]
Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing Graphs and Indexes with Re- cursive Graph Bisection. InProceedings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining (KDD). 1535–1544. doi:10.1145/2939672.2939862
-
[6]
Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block- max indexes. InProceedings of the 34th International ACM SIGIR Conference on RISE : A Rust Library for Inverted Index Search Engines Research and Development in Information Retrieval (SIGIR). 993–1002. doi:10. 1145/2009916.2010048
arXiv 2011
-
[7]
Peter Elias. 1974. Efficient Storage and Retrieval by Content and Address of Static Files.J. ACM21, 2 (April 1974), 246–260. doi:10.1145/321812.321820
-
[8]
Robert M. Fano. 1971.On the Number of Bits Required to Implement an Associative Memory. https://books.google.it/books?id=07DeGwAACAAJ
1971
-
[9]
Daniel Lemire, Nathan Kurz, and Christoph Rupp. 2018. Stream VByte: Faster Byte-Oriented Integer Compression.Inform. Process. Lett.130 (Feb. 2018), 1–6. doi:10.1016/j.ipl.2017.09.011
-
[10]
Daniel Lemire, Gregory Ssi-Yan-Kai, and Owen Kaser. 2016. Consistently faster and smaller compressed bitmaps with Roaring.Software: Practice and Experience 46, 11 (2016), 1547–1569. doi:10.1002/spe.2402
-
[11]
Joel Mackenzie, Sean MacAvaney, Antonio Mallia, and Michał Siedlaczek. 2026. Practical, Efficient, In-Memory Inverted Indexes. InAdvances in Information Retrieval. 3–10. doi:10.1007/978-3-032-21321-1_1
-
[12]
Joel Mackenzie, Matthias Petri, and Luke Gallagher. 2022. IOQP: A simple Impact- Ordered Query Processor written in Rust. InProceedings of the International Conference on Design of Experimental Search and Information REtrieval Systems (DESIRES). 22–34
2022
-
[13]
Joel Mackenzie, Matthias Petri, and Alistair Moffat. 2021. Faster Index Reordering with Bipartite Graph Partitioning. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1910– 1914
2021
-
[14]
Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval (SIGIR). 625–634. doi:10.1145/3077136.3080780
-
[15]
Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant Indexes and Search for Academia. InProceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 50–56. http://ceur- ws.org/Vol-2409/docker08.pdf
2019
-
[16]
Alistair Moffat and Lang Stuiver. 2000. Binary Interpolative Coding for Effective Index Compression.Information Retrieval3, 1 (July 2000), 25–47. doi:10.1023/A: 1013002601898
work page doi:10.1023/a: 2000
-
[17]
Alistair Moffat and Justin Zobel. 2006. Inverted Files for Text Search Engines. Comput. Surveys38, 2 (2006). doi:10.1145/1132956.1132959
-
[18]
Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned Elias-Fano indexes. InProceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 273–282. doi:10.1145/2600428. 2609615
-
[19]
Giulio Ermanno Pibiri and Rossano Venturini. 2020. Techniques for Inverted Index Compression.Comput. Surveys53, 6 (Dec. 2020), 125:1–125:36. doi:10. 1145/3415148
2020
-
[20]
Quickwit OSS. 2026. Tantivy: A full-text search engine library written in Rust. https://github.com/quickwit-oss/tantivy. Accessed: May 2026
2026
-
[21]
Stephen Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3
1994
-
[22]
Fabrizio Silvestri. 2007. Sorting Out the Document Identifier Assignment Problem. InAdvances in Information Retrieval. 101–112. doi:10.1007/978-3-540-71496-5_12
-
[23]
Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, and Paramjit S. Oberoi. 2011. SIMD-based decoding of posting lists. InProceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM). 317–326. doi:10.1145/2063576.2063627
-
[24]
Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient Query Processing for Scalable Web Search.Foundations and Trends in Information Retrieval12, 4–5 (Dec. 2018), 319–500. doi:10.1561/1500000057
-
[25]
Andrew Trotman. 2014. Compression, SIMD, and Postings Lists. InProceed- ings of the 19th Australasian Document Computing Symposium(Melbourne, VIC, Australia)(ADCS). 50–57. doi:10.1145/2682862.2682870
-
[26]
Howard Turtle and James Flood. 1995. Query evaluation: Strategies and opti- mizations.Information Processing & Management31, 6 (Nov. 1995), 831–850. doi:10.1016/0306-4573(95)00020-H
-
[27]
Vespa AI. 2026. Vespa. https://github.com/vespa-engine/vespa. Accessed: May 2026
2026
-
[28]
Marcin Zukowski, Sándor Heman, Niels Nes, and Peter Boncz. 2006. Super- Scalar RAM-CPU Cache Compression. InProceedings of the 22nd International Conference on Data Engineering (ICDE). 59–59. doi:10.1109/ICDE.2006.150
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.