pith. sign in

arxiv: 2606.22778 · v1 · pith:YCQKR7EWnew · submitted 2026-06-22 · 💻 cs.IR · cs.CL

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Pith reviewed 2026-06-26 07:19 UTC · model grok-4.3

classification 💻 cs.IR cs.CL
keywords retrieval benchmarklightweight evaluationmodel comparisonefficiency settingssemantic searchranking correlationunified formatcompact datasets
0
0 comments X

The pith

HAKARI-Bench turns large retrieval benchmarks into small Nano-sets that reproduce model rankings with Spearman correlation above 0.97.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HAKARI-Bench as a way to create compact versions of existing retrieval evaluation sets. These versions, called Nano-sets, cover hundreds of tasks across many languages in one format. Testing 55 models on the small sets produces rankings that closely match those from the full original benchmarks. This setup makes it possible to compare different retrieval methods and their efficiency options like quantization under the same conditions. Developers can use it for quick checks during model selection instead of running full heavy evaluations.

Core claim

HAKARI-Bench reconstructs existing retrieval suites into small datasets called Nano-sets, which support unified comparisons of five retrieval families and their efficiency variants while reproducing the rankings from official full benchmarks at Spearman correlation greater than 0.97 across 55 models.

What carries the argument

Nano-sets, small reconstructed datasets from full benchmarks that maintain relative model performance rankings in a unified format for comparing retrieval architectures.

If this is right

  • Developers can perform rapid model selection and regression detection during development.
  • Comparison of quality versus efficiency becomes feasible across many models and settings.
  • The benchmark supports evaluation of lexical, dense, sparse, late interaction, and reranker approaches under identical conditions.
  • It covers 35 benchmarks and 551 tasks in 43 languages without replacing full evaluations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Teams could integrate Nano-set style reductions into other evaluation domains to speed up iteration cycles.
  • Production systems might adopt the unified format to monitor performance across different retrieval configurations more frequently.
  • Further work could test whether the approach holds for emerging retrieval methods not included in the original sets.

Load-bearing premise

The construction of the Nano-sets preserves relative performance rankings from the full benchmarks without selection bias or format artifacts.

What would settle it

Finding a retrieval model whose performance ranking on the Nano-sets differs substantially from its ranking on the corresponding full benchmarks would challenge the reproduction claim.

Figures

Figures reproduced from arXiv: 2606.22778 by Yuichi Tateno.

Figure 1
Figure 1. Figure 1: Correspondence between the official evaluation and the Nano-set overall rankings (MMTEB / MTEB-v2 / BEIR-en). factors behind the differences are gathered in Appendix D. From these results, Nano-sets are not a final evaluation replacing the official full retrieval, nor do they guarantee absolute-score agreement. However, for iterative ranking judgments such as model selection, separating the top from the mi… view at source ↗
Figure 2
Figure 2. Figure 2: Task-bootstrap 95% confidence intervals for the top 10 dense macro models. the step step𝑑 = (max𝑑 − min𝑑)/255, map each value 𝑥 to a bucket by (𝑥 − min𝑑)/step𝑑 , shift by −128, clip to [−128, 127], and reduce to an 8-bit integer (256 levels; truncate the fractional part). Calibration is on the distribution-stable corpus side only; to avoid fitting buckets to evaluation queries, the query side uses the same… view at source ↗
Figure 3
Figure 3. Figure 3: 𝑧-scores of dense, cross-encoder, and LLM-style rerankers (left: multilingual tasks; right: English tasks). By scope ( [PITH_FULL_IMAGE:figures/full_fig_p040_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of each model’s short-task 𝑧 (query/document both short) and long-task 𝑧 (query or document long) as two bars (green = short-task 𝑧, purple = long-task 𝑧). Sorted by descending short 𝑧 − long 𝑧; higher = short-favored, lower = long-favored. The type (dense / cross-encoder (CE) / LLM-reranker) is in parentheses after the model name. at long +1.62, standing out on scopes with reasoning, instructio… view at source ↗
Figure 5
Figure 5. Figure 5: Per-scope ranks of 38 first-stage retrieval systems (1 = best; saturated display at rank 20). rank 15) rank 3; this is exactly the setting these models are tuned for, and within this range they are at least on par with the latest top general models. (ii) On the two long-document series, BM25 is rank 1 on both (overall rank 24): NanoMLDR has ≈ 5K–28K-char and NanoLongEmbed ≈ 28K–326K-char documents, and man… view at source ↗
Figure 6
Figure 6. Figure 6: Top of the English NanoBEIR (13 tasks, NanoBEIR-en) micro leaderboard. than hidden in the aggregate. F.3 Reranking: the reranker advantage concentrates in the semantic-search scope As in §4.2, the benchmark scores all models as rerankers over the same fixed hybrid candi￾date set, so embedding models and rerankers can be compared directly. On the overall (54 models excluding BM25; macro, with safeguard), on… view at source ↗
Figure 7
Figure 7. Figure 7: Top composition of overall reranking (left) and NanoMIRACL (right). F.4 Dimensionality reduction and quantization: mild, uniform, and model￾specific costs Applying the same efficiency settings (Matryoshka dimensionality reduction, int8/binary quanti￾zation) to 33 dense models, we measured the macro delta vs. base ( [PITH_FULL_IMAGE:figures/full_fig_p045_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Matryoshka dimensionality-reduction retention (left, native-dimension ratio) and int8/binary quantization degradation (right). ported independently (Thoresen, 2026: “E5-base-v2 drops to 92%, E5-small-v2 to 87%”). Conversely, models trained for quantization robustness (jinaai/jina-embeddings-v5 fam￾ily, google/embeddinggemma-300m, Snowflake/snowflake-arctic-embed-l-v2.0, Qwen/Qwen3- Embedding-0.6B) stay wit… view at source ↗
read the original abstract

With the rapid spread of retrieval-augmented generation and semantic search, choosing the right embedding and retrieval configuration is increasingly hard. Large retrieval benchmarks are comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction, quantization, reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval suites into small datasets (Nano-sets): 35 benchmarks and 551 tasks across 43 languages in a unified format, enabling same-condition, model-agnostic comparison of five retrieval families (BM25, dense, sparse, late interaction, rerankers) and their efficiency variants. Across 55 models, its overall ranking reproduces the official MTEB retrieval v2, MMTEB v2 retrieval, and English BEIR (full) at Spearman >0.97. HAKARI-Bench does not replace full evaluation; it enables rapid model selection, regression detection, and reading the quality-efficiency Pareto frontier. Code, data, and leaderboard are released under the MIT license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval evaluation suites into small 'Nano-sets' comprising 35 benchmarks and 551 tasks across 43 languages in a unified format. It enables same-condition comparison of five retrieval families (BM25, dense, sparse, late interaction, rerankers) and their efficiency variants (dimensionality reduction, quantization, reranking). The central empirical claim is that, across 55 models, the overall ranking from HAKARI-Bench reproduces the official MTEB retrieval v2, MMTEB v2 retrieval, and English BEIR (full) rankings at Spearman correlation >0.97.

Significance. If the Nano-set construction is shown to be model-agnostic and free of selection bias, the benchmark would provide substantial practical value for rapid model selection, regression detection during development, and reading the quality-efficiency Pareto frontier without the cost of full-scale evaluation. The release of code, data, and leaderboard under the MIT license is a clear strength supporting reproducibility and adoption.

major comments (2)
  1. [Abstract] Abstract: The headline claim of Spearman >0.97 reproduction across 55 models is load-bearing for the contribution, yet the manuscript supplies no explicit construction algorithm for the 35 Nano-benchmarks and 551 tasks, no independence proof that selection avoided performance signals from the evaluated models, and no ablation on held-out models. This leaves open the possibility that query sampling, language balancing, or task filtering introduced bias that artifactually inflates the reported correlations.
  2. [Results] Results (correlation tables): Without an ablation demonstrating that the high Spearman correlation holds for models not involved in any Nano-set tuning or filtering decisions, it remains unclear whether HAKARI-Bench will reliably rank new retrieval architectures or efficiency variants as claimed.
minor comments (1)
  1. [Introduction] Ensure the main text explicitly lists the five retrieval families and their efficiency variants when first introduced, for consistency with the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the importance of explicit construction details and robustness checks for HAKARI-Bench. We address each major comment below and commit to revisions that strengthen the presentation of the Nano-set construction and its model-agnostic nature.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of Spearman >0.97 reproduction across 55 models is load-bearing for the contribution, yet the manuscript supplies no explicit construction algorithm for the 35 Nano-benchmarks and 551 tasks, no independence proof that selection avoided performance signals from the evaluated models, and no ablation on held-out models. This leaves open the possibility that query sampling, language balancing, or task filtering introduced bias that artifactually inflates the reported correlations.

    Authors: We acknowledge that the construction details merit greater explicitness. Section 3 describes the Nano-set procedure as fixed sampling from the original benchmarks using only dataset metadata (query length stratification, language distribution balancing, and task-type quotas), with all decisions finalized before any of the 55 models were evaluated. No retrieval scores or model outputs informed the selection. In the revision we will add a dedicated subsection containing pseudocode for the full construction algorithm together with an explicit independence statement confirming that no performance signals were used. We also agree that a held-out ablation would further support the claim and will incorporate evaluations on additional models in the revised manuscript. revision: yes

  2. Referee: [Results] Results (correlation tables): Without an ablation demonstrating that the high Spearman correlation holds for models not involved in any Nano-set tuning or filtering decisions, it remains unclear whether HAKARI-Bench will reliably rank new retrieval architectures or efficiency variants as claimed.

    Authors: The Nano-set construction involved no tuning or model-dependent filtering; all sampling rules were derived solely from the original benchmark specifications and applied uniformly. Consequently the reported correlations already reflect performance on 55 models that played no role in benchmark design. Nevertheless, to directly address the request for an explicit held-out ablation, we will add results for a supplementary set of models drawn from families not represented in the current 55 in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation against external benchmarks

full rationale

The paper's core claim is that HAKARI-Bench rankings reproduce official MTEB v2, MMTEB v2, and BEIR rankings at Spearman >0.97 across 55 models. This is presented as a post-hoc empirical check on subsampled Nano-sets reconstructed from existing suites, not a derivation from fitted parameters or internal definitions. No equations, self-citations, or construction steps are shown to reduce the reported correlation to a fit or self-referential input by construction. The subsampling is described as model-agnostic reconstruction, and the correlation serves as external validation rather than a load-bearing premise. This matches the default expectation of non-circularity for benchmark papers with independent external checks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's central contribution rests on the empirical validation that Nano-sets maintain high rank correlation with full benchmarks, which is a domain assumption rather than a derived result.

axioms (1)
  • domain assumption The selected Nano-sets preserve ranking correlations with full benchmarks
    This is the core assumption enabling the lightweight property.

pith-pipeline@v0.9.1-grok · 5720 in / 1131 out tokens · 31276 ms · 2026-06-26T07:19:55.398228+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

152 extracted references · 11 canonical work pages

  1. [1]

    2024 , howpublished =

    Aarsen, Tom , title =. 2024 , howpublished =

  2. [2]

    jina-embeddings-v5-text: Task-Targeted Embedding Distillation , year =

    Akram, Mohammad Kalim and Sturua, Saba and Havriushenko, Nastia and Herreros, Quentin and G. jina-embeddings-v5-text: Task-Targeted Embedding Distillation , year =

  3. [3]

    2026 , journal =

    Ayaou, Iliass and Cavallucci, Denis and Chibane, Hicham , title =. 2026 , journal =

  4. [4]

    2016 , journal =

    Bajaj, Payal and Campos, Daniel and Craswell, Nick and Deng, Li and Gao, Jianfeng and Liu, Xiaodong and Majumder, Rangan and McNamara, Andrew and Mitra, Bhaskar and Nguyen, Tri and Rosenberg, Mir and Song, Xia and Stoica, Alina and Tiwary, Saurabh and Wang, Tong , title =. 2016 , journal =

  5. [5]

    2025 , journal =

    Banar, Nikolay and Lotfi, Ehsan and Van Nooten, Jens and Arhiliuc, Cristina and Kliocaite, Marija and Daelemans, Walter , title =. 2025 , journal =

  6. [6]

    2019 , journal =

    Ben Abacha, Asma and Demner-Fushman, Dina , title =. 2019 , journal =

  7. [7]

    2019 , booktitle =

    Bhattacharya, Paheli and Ghosh, Kripabandhu and Ghosh, Saptarshi and Pal, Arindam and Mehta, Parth and Bhattacharya, Arnab and Majumder, Prasenjit , title =. 2019 , booktitle =

  8. [8]

    2016 , booktitle =

    Boteva, Vera and Gholipour Ghalandari, Demian and Sokolov, Artem and Riezler, Stefan , title =. 2016 , booktitle =

  9. [9]

    Fine-tuning an

    C. Fine-tuning an. 2024 , howpublished =

  10. [10]

    2024 , booktitle =

    Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng , title =. 2024 , booktitle =

  11. [11]

    2024 , journal =

    Ciancone, Mathieu and Kerboua, Imene and Schaeffer, Marion and Siblini, Wissam , title =. 2024 , journal =

  12. [12]

    2023 , journal =

    Dao, Tri , title =. 2023 , journal =

  13. [13]

    and Kunchukuttan, Anoop and Kumar, Pratyush , title =

    Doddapaneni, Sumanth and Aralikatte, Rahul and Ramesh, Gowtham and Goyal, Shreya and Khapra, Mitesh M. and Kunchukuttan, Anoop and Kumar, Pratyush , title =. 2023 , booktitle =

  14. [14]

    2025 , journal =

    Enevoldsen, Kenneth and Chung, Isaac and Kerboua, Imene and Kardos, M. 2025 , journal =

  15. [15]

    The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding , year =

    Enevoldsen, Kenneth and Kardos, M. The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding , year =

  16. [16]

    2021 , booktitle =

    Formal, Thibault and Piwowarski, Benjamin and Clinchant, St. 2021 , booktitle =

  17. [17]

    2024 , journal =

    Gao, Jianyang and Long, Cheng , title =. 2024 , journal =

  18. [18]

    2013 , booktitle =

    Ge, Tiezheng and He, Kaiming and Ke, Qifa and Sun, Jian , title =. 2013 , booktitle =

  19. [19]

    Guha, Neel and Nyarko, Julian and Ho, Daniel E. and R. 2023 , journal =

  20. [20]

    2021 , booktitle =

    Hoppe, Christoph and Pelkmann, David and Migenda, Nico and Hotte, Daniel and Schenck, Wolfram , title =. 2021 , booktitle =

  21. [21]

    Product Quantization for Nearest Neighbor Search , year =

    J. Product Quantization for Nearest Neighbor Search , year =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

  22. [22]

    Dense Passage Retrieval for Open-Domain Question Answering , year =

    Karpukhin, Vladimir and O. Dense Passage Retrieval for Open-Domain Question Answering , year =

  23. [23]

    2020 , booktitle =

    Khattab, Omar and Zaharia, Matei , title =. 2020 , booktitle =

  24. [24]

    2022 , booktitle =

    Kusupati, Aditya and Bhatt, Gantavya and Rege, Aniket and Wallingford, Matthew and Sinha, Aditya and Ramanujan, Vivek and Howard-Snyder, William and Chen, Kaifeng and Kakade, Sham and Jain, Prateek and Farhadi, Ali , title =. 2022 , booktitle =

  25. [25]

    2024 , journal =

    Lassance, Carlos and D. 2024 , journal =

  26. [26]

    2023 , journal =

    Li, Haitao and Shao, Yunqiu and Wu, Yueyue and Ai, Qingyao and Ma, Yixiao and Liu, Yiqun , title =. 2023 , journal =

  27. [27]

    2024 , journal =

    Li, Xiangyang and Dong, Kuicai and Lee, Yi Quan and Xia, Wei and Zhang, Hao and Dai, Xinyi and Wang, Yong and Tang, Ruiming , title =. 2024 , journal =

  28. [28]

    2025 , howpublished =

  29. [29]

    Introducing

    Liu, Friso and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and F. Introducing. 2025 , howpublished =

  30. [30]

    2024 , howpublished =

    Lu, Xing Han , title =. 2024 , howpublished =

  31. [31]

    2019 , booktitle =

    Manor, Laura and Li, Junyi Jessy , title =. 2019 , booktitle =

  32. [32]

    2023 , booktitle =

    Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo. 2023 , booktitle =

  33. [33]

    2019 , journal =

    Nogueira, Rodrigo and Cho, Kyunghyun , title =. 2019 , journal =

  34. [34]

    and Duderstadt, Brandon and Mulyar, Andriy , title =

    Nussbaum, Zach and Morris, John X. and Duderstadt, Brandon and Mulyar, Andriy , title =. 2024 , journal =

  35. [35]

    2026 , booktitle =

    Pham, Long and Luu, Tuan and Vo, Thang and Nguyen, Minh and Hoang, Vu , title =. 2026 , booktitle =

  36. [36]

    2026 , howpublished =

    Pijpelink, Arnaud , title =. 2026 , howpublished =

  37. [37]

    2022 , booktitle =

    Qiu, Yifu and Li, Hongyu and Qu, Yingqi and Chen, Ying and She, Qiaoqiao and Liu, Jing and Wu, Hua and Wang, Haifeng , title =. 2022 , booktitle =

  38. [38]

    2019 , booktitle =

    Reimers, Nils and Gurevych, Iryna , title =. 2019 , booktitle =

  39. [39]

    , title =

    Roberts, Kirk and Alam, Tasmeer and Bedrick, Steven and Demner-Fushman, Dina and Lo, Kyle and Soboroff, Ian and Voorhees, Ellen and Wang, Lucy Lu and Hersh, William R. , title =. 2021 , journal =

  40. [40]

    2009 , journal =

    Robertson, Stephen and Zaragoza, Hugo , title =. 2009 , journal =

  41. [41]

    2021 , journal =

    Santhanam, Keshav and Khattab, Omar and Saad-Falcon, Jon and Potts, Christopher and Zaharia, Matei , title =. 2021 , journal =

  42. [42]

    2024 , howpublished =

  43. [43]

    2025 , journal =

    Shahinmoghadam, Mehrzad and Motamedi, Ali , title =. 2025 , journal =

  44. [44]

    Binary and Scalar Embedding Quantization for Significantly Faster and Cheaper Retrieval , year =

    Shakir, Aamir and Aarsen, Tom and. Binary and Scalar Embedding Quantization for Significantly Faster and Cheaper Retrieval , year =

  45. [45]

    2025 , booktitle =

    Sheikh, Nadia Amin and Buades Marcos, David and Jousse, Anne-Laure and Oladipo, Akintunde and Rousseau, Olivier and Lin, Jimmy , title =. 2025 , booktitle =

  46. [46]

    2024 , booktitle =

    Shiraee Kasmaee, Ali and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila , title =. 2024 , booktitle =

  47. [47]

    2025 , howpublished =

    Nano-. 2025 , howpublished =

  48. [48]

    2025 , booktitle =

    Snegirev, Artem and Tikhonova, Maria and Maksimova, Anna and Fenogenova, Alena and Abramov, Alexander , title =. 2025 , booktitle =

  49. [49]

    2025 , booktitle =

    Song, Tingyu and Gan, Guo and Shang, Mingsheng and Zhao, Yilun , title =. 2025 , booktitle =

  50. [50]

    2025 , howpublished =

    Sourty, Rapha. 2025 , howpublished =

  51. [51]

    An overview of the European Union's highly multilingual parallel corpora , year =

    Steinberger, Ralf and Ebrahim, Mohamed and Poulis, Alexandros and Carrasco-Benitez, Manuel and Schl. An overview of the European Union's highly multilingual parallel corpora , year =. Language Resources and Evaluation , volume =

  52. [52]

    and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O

    Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S. and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O. and Chen, Danqi and Yu, Tao , title =. 2024 , journal =

  53. [53]

    2021 , booktitle =

    Thakur, Nandan and Reimers, Nils and R. 2021 , booktitle =

  54. [54]

    2026 , howpublished =

    Thoresen, Thomas Hjelde , title =. 2026 , howpublished =

  55. [55]

    2024 , howpublished =

    Trent, Benjamin , title =. 2024 , howpublished =

  56. [56]

    2024 , journal =

    Tsukagoshi, Hayato and Sasano, Ryohei , title =. 2024 , journal =

  57. [57]

    2026 , howpublished =

    Veasey, Thomas , title =. 2026 , howpublished =

  58. [58]

    and Harman, Donna K

    Voorhees, Ellen M. and Harman, Donna K. , title =. 2005 , publisher =

  59. [59]

    2020 , booktitle =

    Wadden, David and Lin, Shanchuan and Lo, Kyle and Wang, Lucy Lu and van Zuylen, Madeleine and Cohan, Arman and Hajishirzi, Hannaneh , title =. 2020 , booktitle =

  60. [60]

    2024 , journal =

    Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu , title =. 2024 , journal =

  61. [61]

    2024 , journal =

    Wang, Xiaoyue and Wang, Jianyou and Cao, Weili and Wang, Kaicheng and Paturi, Ramamohan and Bergen, Leon , title =. 2024 , journal =

  62. [62]

    and Xie, Yiqing and Neubig, Graham and Fried, Daniel , title =

    Wang, Zora Zhiruo and Asai, Akari and Yu, Xinyan Velocity and Xu, Frank F. and Xie, Yiqing and Neubig, Graham and Fried, Daniel , title =. 2025 , booktitle =

  63. [63]

    2024 , journal =

    Weller, Orion and Chang, Benjamin and MacAvaney, Sean and Lo, Kyle and Cohan, Arman and Van Durme, Benjamin and Lawrie, Dawn and Soldaini, Luca , title =. 2024 , journal =

  64. [64]

    2025 , booktitle =

    Weller, Orion and Ricci, Kathryn and Marone, Marc and Chaffin, Antoine and Lawrie, Dawn and Van Durme, Benjamin , title =. 2025 , booktitle =

  65. [65]

    2024 , booktitle =

    Wojtasik, Konrad and Wo. 2024 , booktitle =

  66. [66]

    2021 , booktitle =

    Wrzalik, Marco and Krechel, Dirk , title =. 2021 , booktitle =

  67. [67]

    Thomas and Al Moubayed, Noura , title =

    Xiao, Chenghao and Hudson, G. Thomas and Al Moubayed, Noura , title =. 2024 , journal =

  68. [68]

    2024 , booktitle =

    Xiao, Shitao and Liu, Zheng and Zhang, Peitian and Muennighoff, Niklas and Lian, Defu and Nie, Jian-Yun , title =. 2024 , booktitle =

  69. [69]

    2024 , journal =

    Xu, Cheng and Guan, Shuhao and Greene, Derek and Kechadi, M-Tahar , title =. 2024 , journal =

  70. [70]

    2021 , booktitle =

    Yamada, Ikuya and Asai, Akari and Hajishirzi, Hannaneh , title =. 2021 , booktitle =

  71. [71]

    2018 , journal =

    Zhang, Sheng and Zhang, Xin and Wang, Hui and Guo, Lixiang and Liu, Shanshan , title =. 2018 , journal =

  72. [72]

    2025 , journal =

    Zhang, Xin and Li, Lei and Zhou, Xiaohan and Liu, Zheng , title =. 2025 , journal =

  73. [73]

    2023 , booktitle =

    Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy , title =. 2023 , booktitle =

  74. [74]

    2024 , booktitle =

    Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and Zhang, Meishan and Li, Wenjie and Zhang, Min , title =. 2024 , booktitle =

  75. [75]

    2025 , journal =

    Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren , title =. 2025 , journal =

  76. [76]

    2024 , journal =

    Zhu, Dawei and Wang, Liang and Yang, Nan and Song, Yifan and Wu, Wenhao and Wei, Furu and Li, Sujian , title =. 2024 , journal =

  77. [77]

    NanoBEIR : Lightweight BEIR subsets for iterative retrieval evaluation

    Tom Aarsen. NanoBEIR : Lightweight BEIR subsets for iterative retrieval evaluation. Hugging Face Hub Dataset Collection / Sentence Transformers, 2024. URL https://huggingface.co/collections/zeta-alpha-ai/nanobeir-66e1a0af21dfd93e620cd9f6

  78. [78]

    jina-embeddings-v5-text: Task-targeted embedding distillation

    Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael G \"u nther, Maximilian Werk, and Han Xiao. jina-embeddings-v5-text: Task-targeted embedding distillation. arXiv preprint arXiv:2602.15547, 2026. URL https://arxiv.org/abs/2602.15547

  79. [79]

    DAPFAM : A domain-aware family-level dataset to benchmark cross domain patent retrieval

    Iliass Ayaou, Denis Cavallucci, and Hicham Chibane. DAPFAM : A domain-aware family-level dataset to benchmark cross domain patent retrieval. Array, page 100720, 2026. URL https://doi.org/10.1016/j.array.2026.100720

  80. [80]

    MS MARCO : A human generated machine reading comprehension dataset

    Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. MS MARCO : A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268, 2016. URL https://arxiv.org/abs/1611.09268

Showing first 80 references.