RISE: A Rust Library for Inverted Index Search Engines

Angelo Savino; Rossano Venturini

arxiv: 2606.07187 · v1 · pith:YFHCTULRnew · submitted 2026-06-05 · 💻 cs.IR

RISE: A Rust Library for Inverted Index Search Engines

Angelo Savino , Rossano Venturini This is my paper

Pith reviewed 2026-06-27 20:49 UTC · model grok-4.3

classification 💻 cs.IR

keywords inverted indexinformation retrievalRust libraryfull-text searchquery performancesearch engineindex construction

0 comments

The pith

RISE is a Rust library for inverted indexes that matches or exceeds existing tools with up to 2x query speedups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RISE as a library written in Rust for building and querying inverted indexes that support fast full-text search over large text collections. Inverted indexes map each term to the list of documents containing it so that algorithms can quickly find documents matching a query. The authors used Rust's performance and memory safety features along with its trait system to create an implementation they describe as robust and easy to extend. They reproduced many earlier inverted-index techniques inside this new library and then benchmarked RISE against other libraries on several datasets, reporting competitive results that include speedups reaching 2x. The work matters because faster or safer index implementations can directly improve the responsiveness of search systems that handle real user queries.

Core claim

We present RISE, a novel inverted index library implemented in Rust, designed to deliver high performance and efficiency for information retrieval tasks. RISE leverages Rust's safety and performance to provide a robust solution for building and querying inverted indexes, while offering accessible extensibility through its expressive trait system. While developing RISE, we revisited the inverted-index literature, thereby reproducing numerous prior works using this new test bench. We evaluated RISE against existing libraries, demonstrating competitive query performance across various datasets and workloads, with speedups of up to 2x over the current state of the art.

What carries the argument

The RISE library, which implements inverted-index data structures and query algorithms in Rust and exposes them through traits for extensibility and customization.

If this is right

Prior inverted-index algorithms can be reproduced and compared inside a single safe, high-performance code base.
Search-engine implementers gain a new option that can reduce query latency on standard hardware.
The trait-based design allows new index variants or query operators to be added without modifying core components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A common Rust library could become a shared test bed that makes it easier for researchers to compare new index techniques against one another.
Production systems that adopt RISE might see lower server costs or higher query throughput if the reported speedups hold under their workloads.
The library's emphasis on extensibility opens a route for adding support for specialized data types such as numeric fields or geo-locations in future extensions.

Load-bearing premise

The benchmarks and datasets used fairly represent real-world workloads and the other libraries were configured and measured under equivalent conditions.

What would settle it

A head-to-head test on the same datasets and query workloads where RISE is consistently slower than the fastest existing library by more than a small constant factor.

Figures

Figures reproduced from arXiv: 2606.07187 by Angelo Savino, Rossano Venturini.

read the original abstract

Inverted indexes are a crucial data structure for efficient information retrieval in large text corpora. They enable fast full-text search by mapping each term to the documents in which it appears, on top of which efficient algorithms quickly retrieve the documents relevant to a user query. We present RISE, a novel inverted index library implemented in Rust, designed to deliver high performance and efficiency for information retrieval tasks. RISE leverages Rust's safety and performance to provide a robust solution for building and querying inverted indexes, while offering accessible extensibility through its expressive trait system. While developing RISE, we revisited the inverted-index literature, thereby reproducing numerous prior works using this new test bench. We evaluated RISE against existing libraries, demonstrating competitive query performance across various datasets and workloads, with speedups of up to 2x over the current state of the art. Our results indicate that RISE is a promising tool for researchers and practitioners in the field of information retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RISE is a new Rust inverted-index library that reproduces some prior algorithms and claims up to 2x speedups, but the abstract supplies no experimental details at all.

read the letter

The paper ships a Rust library called RISE for building and querying inverted indexes. It uses the language's safety guarantees and trait system for extensibility, and the authors say they reproduced several classic algorithms while developing it.

That reproduction step and the choice of Rust are the concrete contributions. Rust can be a reasonable fit for index structures where memory safety matters, and an open implementation might help people who already work in that ecosystem.

The performance claim is the weak point. The abstract states competitive results with speedups up to 2x over the state of the art, yet it gives no datasets, no baseline configurations, no hardware details, and no description of the workloads. Without those, the central empirical result cannot be checked.

This is mainly for IR practitioners or researchers who need a Rust-based index and are willing to evaluate the code themselves. Readers looking for new algorithmic ideas or rigorously documented speedups will find little to use.

If the full paper contains public code, data, and properly described experiments, it could be worth sending out for review as an engineering artifact. On the abstract alone the evidence is too thin to justify referee time.

Referee Report

2 major / 0 minor

Summary. The manuscript presents RISE, a Rust library for inverted indexes that leverages Rust's safety guarantees and trait system for extensibility. It reports reproducing multiple prior inverted-index algorithms as a test bench and claims competitive query performance with speedups of up to 2x over existing libraries across various datasets and workloads.

Significance. A well-documented, reproducible Rust implementation of inverted-index primitives could serve as a useful reference implementation for the IR community, particularly if the performance claims are supported by transparent benchmarks. The reproduction of prior work is a positive contribution, but the absence of experimental details prevents any assessment of whether the reported speedups represent a genuine advance.

major comments (2)

[Abstract] Abstract (final paragraph): the central performance claim ('speedups of up to 2x over the current state of the art') is stated without any accompanying description of datasets, query workloads, baseline library versions and configurations, hardware, measurement methodology, or statistical reporting. This renders the claim unverifiable from the manuscript.
[Evaluation] Evaluation section (implied by the abstract's reference to 'various datasets and workloads'): no tables, figures, or text provide the concrete numbers, error bars, or configuration details needed to evaluate the 'competitive query performance' assertion or to reproduce the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the need for greater experimental transparency. We agree that the performance claims require supporting details on datasets, workloads, baselines, hardware, and methodology to be verifiable and reproducible. We will revise the manuscript to address these points fully.

read point-by-point responses

Referee: [Abstract] Abstract (final paragraph): the central performance claim ('speedups of up to 2x over the current state of the art') is stated without any accompanying description of datasets, query workloads, baseline library versions and configurations, hardware, measurement methodology, or statistical reporting. This renders the claim unverifiable from the manuscript.

Authors: We agree that the abstract's performance claim is presented without sufficient context. In the revised manuscript we will expand the abstract to include a brief but explicit description of the evaluation setup (standard datasets, query workloads, baseline library versions and configurations, hardware platform, measurement methodology, and statistical reporting). Full details will remain in the evaluation section. revision: yes
Referee: [Evaluation] Evaluation section (implied by the abstract's reference to 'various datasets and workloads'): no tables, figures, or text provide the concrete numbers, error bars, or configuration details needed to evaluate the 'competitive query performance' assertion or to reproduce the experiments.

Authors: We concur that the evaluation section currently lacks the required tables, figures, concrete performance numbers, error bars, and configuration details. The revised version will add comprehensive tables and figures reporting query times with appropriate statistical measures, together with exhaustive configuration information for all baselines, hardware, and experimental methodology to support both assessment and reproduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical library evaluation

full rationale

The paper presents an implementation of an inverted-index library in Rust together with direct empirical benchmarks against other libraries. No mathematical derivations, first-principles predictions, fitted parameters, or uniqueness theorems appear in the abstract or the described content. The central claim (competitive query performance with up to 2x speedups) is a measured outcome on stated datasets and workloads, not a quantity derived from prior results by the paper's own equations or self-citations. Consequently no load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an implementation and benchmarking effort; its central claim rests on the correctness of the Rust code and the fairness of the (undescribed) experimental setup rather than on new mathematical axioms or invented entities.

pith-pipeline@v0.9.1-grok · 5686 in / 1008 out tokens · 16471 ms · 2026-06-27T20:49:58.941650+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 16 canonical work pages

[1]

Apache Software Foundation. 2026. Apache Lucene. https://github.com/apache/ lucene. Accessed: May 2026

2026
[2]

Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien

Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM). 426–434. doi:10.1145/956863.956944

work page doi:10.1145/956863.956944 2003
[3]

Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini
[4]

InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 152–162. doi:10. 1145/3626772.3657769

arXiv
[5]

Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing Graphs and Indexes with Re- cursive Graph Bisection. InProceedings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining (KDD). 1535–1544. doi:10.1145/2939672.2939862

work page doi:10.1145/2939672.2939862 2016
[6]

Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block- max indexes. InProceedings of the 34th International ACM SIGIR Conference on RISE : A Rust Library for Inverted Index Search Engines Research and Development in Information Retrieval (SIGIR). 993–1002. doi:10. 1145/2009916.2010048

arXiv 2011
[7]

Peter Elias. 1974. Efficient Storage and Retrieval by Content and Address of Static Files.J. ACM21, 2 (April 1974), 246–260. doi:10.1145/321812.321820

work page doi:10.1145/321812.321820 1974
[8]

Robert M. Fano. 1971.On the Number of Bits Required to Implement an Associative Memory. https://books.google.it/books?id=07DeGwAACAAJ

1971
[9]

Daniel Lemire, Nathan Kurz, and Christoph Rupp. 2018. Stream VByte: Faster Byte-Oriented Integer Compression.Inform. Process. Lett.130 (Feb. 2018), 1–6. doi:10.1016/j.ipl.2017.09.011

work page doi:10.1016/j.ipl.2017.09.011 2018
[10]

Daniel Lemire, Gregory Ssi-Yan-Kai, and Owen Kaser. 2016. Consistently faster and smaller compressed bitmaps with Roaring.Software: Practice and Experience 46, 11 (2016), 1547–1569. doi:10.1002/spe.2402

work page doi:10.1002/spe.2402 2016
[11]

Joel Mackenzie, Sean MacAvaney, Antonio Mallia, and Michał Siedlaczek. 2026. Practical, Efficient, In-Memory Inverted Indexes. InAdvances in Information Retrieval. 3–10. doi:10.1007/978-3-032-21321-1_1

work page doi:10.1007/978-3-032-21321-1_1 2026
[12]

Joel Mackenzie, Matthias Petri, and Luke Gallagher. 2022. IOQP: A simple Impact- Ordered Query Processor written in Rust. InProceedings of the International Conference on Design of Experimental Search and Information REtrieval Systems (DESIRES). 22–34

2022
[13]

Joel Mackenzie, Matthias Petri, and Alistair Moffat. 2021. Faster Index Reordering with Bipartite Graph Partitioning. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1910– 1914

2021
[14]

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval (SIGIR). 625–634. doi:10.1145/3077136.3080780

work page doi:10.1145/3077136.3080780 2017
[15]

Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant Indexes and Search for Academia. InProceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 50–56. http://ceur- ws.org/Vol-2409/docker08.pdf

2019
[16]

Alistair Moffat and Lang Stuiver. 2000. Binary Interpolative Coding for Effective Index Compression.Information Retrieval3, 1 (July 2000), 25–47. doi:10.1023/A: 1013002601898

work page doi:10.1023/a: 2000
[17]

Alistair Moffat and Justin Zobel. 2006. Inverted Files for Text Search Engines. Comput. Surveys38, 2 (2006). doi:10.1145/1132956.1132959

work page doi:10.1145/1132956.1132959 2006
[18]

Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned Elias-Fano indexes. InProceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 273–282. doi:10.1145/2600428. 2609615

work page doi:10.1145/2600428 2014
[19]

Giulio Ermanno Pibiri and Rossano Venturini. 2020. Techniques for Inverted Index Compression.Comput. Surveys53, 6 (Dec. 2020), 125:1–125:36. doi:10. 1145/3415148

2020
[20]

Quickwit OSS. 2026. Tantivy: A full-text search engine library written in Rust. https://github.com/quickwit-oss/tantivy. Accessed: May 2026

2026
[21]

Stephen Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3

1994
[22]

Fabrizio Silvestri. 2007. Sorting Out the Document Identifier Assignment Problem. InAdvances in Information Retrieval. 101–112. doi:10.1007/978-3-540-71496-5_12

work page doi:10.1007/978-3-540-71496-5_12 2007
[23]

Stepanov, Anil R

Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, and Paramjit S. Oberoi. 2011. SIMD-based decoding of posting lists. InProceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM). 317–326. doi:10.1145/2063576.2063627

work page doi:10.1145/2063576.2063627 2011
[24]

Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient Query Processing for Scalable Web Search.Foundations and Trends in Information Retrieval12, 4–5 (Dec. 2018), 319–500. doi:10.1561/1500000057

work page doi:10.1561/1500000057 2018
[25]

Andrew Trotman. 2014. Compression, SIMD, and Postings Lists. InProceed- ings of the 19th Australasian Document Computing Symposium(Melbourne, VIC, Australia)(ADCS). 50–57. doi:10.1145/2682862.2682870

work page doi:10.1145/2682862.2682870 2014
[26]

Howard Turtle and James Flood. 1995. Query evaluation: Strategies and opti- mizations.Information Processing & Management31, 6 (Nov. 1995), 831–850. doi:10.1016/0306-4573(95)00020-H

work page doi:10.1016/0306-4573(95)00020-h 1995
[27]

Vespa AI. 2026. Vespa. https://github.com/vespa-engine/vespa. Accessed: May 2026

2026
[28]

Marcin Zukowski, Sándor Heman, Niels Nes, and Peter Boncz. 2006. Super- Scalar RAM-CPU Cache Compression. InProceedings of the 22nd International Conference on Data Engineering (ICDE). 59–59. doi:10.1109/ICDE.2006.150

work page doi:10.1109/icde.2006.150 2006

[1] [1]

Apache Software Foundation. 2026. Apache Lucene. https://github.com/apache/ lucene. Accessed: May 2026

2026

[2] [2]

Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien

Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM). 426–434. doi:10.1145/956863.956944

work page doi:10.1145/956863.956944 2003

[3] [3]

Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini

[4] [4]

InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 152–162. doi:10. 1145/3626772.3657769

arXiv

[5] [5]

Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing Graphs and Indexes with Re- cursive Graph Bisection. InProceedings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining (KDD). 1535–1544. doi:10.1145/2939672.2939862

work page doi:10.1145/2939672.2939862 2016

[6] [6]

Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block- max indexes. InProceedings of the 34th International ACM SIGIR Conference on RISE : A Rust Library for Inverted Index Search Engines Research and Development in Information Retrieval (SIGIR). 993–1002. doi:10. 1145/2009916.2010048

arXiv 2011

[7] [7]

Peter Elias. 1974. Efficient Storage and Retrieval by Content and Address of Static Files.J. ACM21, 2 (April 1974), 246–260. doi:10.1145/321812.321820

work page doi:10.1145/321812.321820 1974

[8] [8]

Robert M. Fano. 1971.On the Number of Bits Required to Implement an Associative Memory. https://books.google.it/books?id=07DeGwAACAAJ

1971

[9] [9]

Daniel Lemire, Nathan Kurz, and Christoph Rupp. 2018. Stream VByte: Faster Byte-Oriented Integer Compression.Inform. Process. Lett.130 (Feb. 2018), 1–6. doi:10.1016/j.ipl.2017.09.011

work page doi:10.1016/j.ipl.2017.09.011 2018

[10] [10]

Daniel Lemire, Gregory Ssi-Yan-Kai, and Owen Kaser. 2016. Consistently faster and smaller compressed bitmaps with Roaring.Software: Practice and Experience 46, 11 (2016), 1547–1569. doi:10.1002/spe.2402

work page doi:10.1002/spe.2402 2016

[11] [11]

Joel Mackenzie, Sean MacAvaney, Antonio Mallia, and Michał Siedlaczek. 2026. Practical, Efficient, In-Memory Inverted Indexes. InAdvances in Information Retrieval. 3–10. doi:10.1007/978-3-032-21321-1_1

work page doi:10.1007/978-3-032-21321-1_1 2026

[12] [12]

Joel Mackenzie, Matthias Petri, and Luke Gallagher. 2022. IOQP: A simple Impact- Ordered Query Processor written in Rust. InProceedings of the International Conference on Design of Experimental Search and Information REtrieval Systems (DESIRES). 22–34

2022

[13] [13]

Joel Mackenzie, Matthias Petri, and Alistair Moffat. 2021. Faster Index Reordering with Bipartite Graph Partitioning. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1910– 1914

2021

[14] [14]

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval (SIGIR). 625–634. doi:10.1145/3077136.3080780

work page doi:10.1145/3077136.3080780 2017

[15] [15]

Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant Indexes and Search for Academia. InProceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 50–56. http://ceur- ws.org/Vol-2409/docker08.pdf

2019

[16] [16]

Alistair Moffat and Lang Stuiver. 2000. Binary Interpolative Coding for Effective Index Compression.Information Retrieval3, 1 (July 2000), 25–47. doi:10.1023/A: 1013002601898

work page doi:10.1023/a: 2000

[17] [17]

Alistair Moffat and Justin Zobel. 2006. Inverted Files for Text Search Engines. Comput. Surveys38, 2 (2006). doi:10.1145/1132956.1132959

work page doi:10.1145/1132956.1132959 2006

[18] [18]

Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned Elias-Fano indexes. InProceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 273–282. doi:10.1145/2600428. 2609615

work page doi:10.1145/2600428 2014

[19] [19]

Giulio Ermanno Pibiri and Rossano Venturini. 2020. Techniques for Inverted Index Compression.Comput. Surveys53, 6 (Dec. 2020), 125:1–125:36. doi:10. 1145/3415148

2020

[20] [20]

Quickwit OSS. 2026. Tantivy: A full-text search engine library written in Rust. https://github.com/quickwit-oss/tantivy. Accessed: May 2026

2026

[21] [21]

Stephen Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3

1994

[22] [22]

Fabrizio Silvestri. 2007. Sorting Out the Document Identifier Assignment Problem. InAdvances in Information Retrieval. 101–112. doi:10.1007/978-3-540-71496-5_12

work page doi:10.1007/978-3-540-71496-5_12 2007

[23] [23]

Stepanov, Anil R

Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, and Paramjit S. Oberoi. 2011. SIMD-based decoding of posting lists. InProceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM). 317–326. doi:10.1145/2063576.2063627

work page doi:10.1145/2063576.2063627 2011

[24] [24]

Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient Query Processing for Scalable Web Search.Foundations and Trends in Information Retrieval12, 4–5 (Dec. 2018), 319–500. doi:10.1561/1500000057

work page doi:10.1561/1500000057 2018

[25] [25]

Andrew Trotman. 2014. Compression, SIMD, and Postings Lists. InProceed- ings of the 19th Australasian Document Computing Symposium(Melbourne, VIC, Australia)(ADCS). 50–57. doi:10.1145/2682862.2682870

work page doi:10.1145/2682862.2682870 2014

[26] [26]

Howard Turtle and James Flood. 1995. Query evaluation: Strategies and opti- mizations.Information Processing & Management31, 6 (Nov. 1995), 831–850. doi:10.1016/0306-4573(95)00020-H

work page doi:10.1016/0306-4573(95)00020-h 1995

[27] [27]

Vespa AI. 2026. Vespa. https://github.com/vespa-engine/vespa. Accessed: May 2026

2026

[28] [28]

Marcin Zukowski, Sándor Heman, Niels Nes, and Peter Boncz. 2006. Super- Scalar RAM-CPU Cache Compression. InProceedings of the 22nd International Conference on Data Engineering (ICDE). 59–59. doi:10.1109/ICDE.2006.150

work page doi:10.1109/icde.2006.150 2006