pith. sign in

arxiv: 2606.17664 · v1 · pith:RTCIEJILnew · submitted 2026-06-16 · 💻 cs.IR · cs.AI

Temporal Preference Optimization for Unsupervised Retrieval

Pith reviewed 2026-06-26 22:50 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords temporal information retrievalunsupervised dense retrievalpreference optimizationcontrastive learningtime embeddingtemporal alignment
0
0 comments X

The pith

TPOUR lets unsupervised retrievers capture temporal relevance by reinterpreting preference optimization along the time axis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TPOUR, which adapts preference learning to the temporal dimension so that dense retrievers trained only on unlabeled documents can prefer documents from the correct time period. This solves the common failure mode where a retriever returns semantically related but temporally wrong documents, such as 2023 articles for a query about events in 2019. The method introduces Temporal Retrieval Preference Optimization (TRPO) and adds a learned time embedding whose interpolation allows the model to handle time periods never seen in training. Experiments show the resulting small models exceed both unsupervised and supervised baselines on temporal IR benchmarks, including a 72-times-smaller Contriever that improves nDCG@5 by more than 12 percent on explicit and 15 percent on implicit temporal queries.

Core claim

TPOUR uses Temporal Retrieval Preference Optimization (TRPO) to reinterpret preference learning in the temporal dimension, guiding the retriever to favor temporally aligned documents, and further generalizes to unseen time periods via interpolation in a learned time embedding.

What carries the argument

Temporal Retrieval Preference Optimization (TRPO), which reinterprets preference learning along the temporal dimension to align retrieved documents with query time periods.

If this is right

  • TPOUR outperforms both unsupervised and supervised baselines on temporal information retrieval tasks.
  • A model roughly 72 times smaller than Qwen-Embedding-8B improves average nDCG@5 by +4.04 on explicit and +4.98 on implicit temporal queries.
  • The learned time embedding supports continuous temporal alignment for time periods absent from the training data.
  • The same training procedure works without any explicit timestamp labels on the documents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same temporal reinterpretation of preferences could be applied to other contrastive objectives to improve time-aware retrieval in news archives or scientific literature.
  • Measuring performance degradation as the gap between training and test time periods increases would test the limits of the interpolation mechanism.
  • Extending the time embedding to handle queries that mention multiple distinct time periods would require combining several interpolated vectors.

Load-bearing premise

The assumption that reinterpreting preference learning along the temporal dimension can reliably guide the retriever toward temporally aligned documents and that interpolation in a learned time embedding will generalize to unseen time periods.

What would settle it

Run the trained model on a set of temporal queries whose target time periods lie outside the range of any timestamps seen during training and check whether nDCG drops sharply relative to in-distribution queries.

Figures

Figures reproduced from arXiv: 2606.17664 by HyunJin Kim, Jaejun Shim, JinYeong Bak, Young Jin Kim.

Figure 1
Figure 1. Figure 1: Comparison between TPOUR aligned at 2019 and a time-unaware retriever for queries with explicit (e.g., in 2019) or implicit (e.g., this year) temporal information. Left: A mixed-timestamp document collection containing (i) semantically and temporally aligned documents (green), (ii) semantically relevant but temporally misaligned documents (yellow), and (iii) irrelevant documents (red). Right: Ranked retrie… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of TPOUR. Given a query Qi and two documents D t i (temporally aligned) and D t ′ i (temporally misaligned), each input is encoded using both the main encoder πθ and the reference encoder πref. 1 Similarity scores are computed between the query and each document using πθ. 2 A contrastive loss LCE, which calculate semantic similarity between Qi and D t i , and a TRPO loss LTPRO for preferring tempo… view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of retrieved document timestamps with time vector interpolation. Heatmaps show the normalized distribution of retrieved document timestamps in years (x-axis) for each test year (y-axis) on SituatedQA. Each heatmap corresponds to a TPOUR Contriever interpolated between retrievers trained on tstart = 2018 and tend = 2021, using weights α, where 0.0 represents the 2018 and 1.0 represents the 2021… view at source ↗
Figure 4
Figure 4. Figure 4: Temporal retrieval performance of interpolated TPOUR Contriever. nDCG@10 on Left: SituatedQA (Yearly) and Right: RealTimeQA (Monthly) using interpolated TPOUR Contriever between π tstart θ and π tend θ (2018/2021 for SituatedQA, January/December 2023 for RealTimeQA), evaluated with explicit and implicit temporal information in queries. The x-axis indicates the interpolation weight α between 2018 and 2021. … view at source ↗
Figure 5
Figure 5. Figure 5: Best-performing interpolation α for each BEIR dataset relative to its creation year. Each point denotes a dataset, where α is the interpolation weight for 2021 between TPOUR Contriever (2018) and (2021). The red regression line indicates that datasets prefer retrievers temporally aligned with their publication year. For example, Climate-FEVER (2020) achieves peak performance at α = 0.7. Time-sensitive data… view at source ↗
Figure 6
Figure 6. Figure 6: An illustration of TPOUR inference. Like standard retrieval, we use the trained encoder πθ to pre-compute representations for all documents at mixed-timestamps t and t ′ , which are then stored in the document index. At inference, a query Qi is encoded as πθ(Qi), and retrieves the document from the index with the highest similarity to the query. The retrieved document is both semantically relevant and temp… view at source ↗
Figure 7
Figure 7. Figure 7: An illustration of the Baseline and the mixture-of-TPOUR Timestamp Predictor under a setup where the linear classifier has the same number of parameters. Given a document, the baseline model (upper) uses a single encoder to generate a representation, which is then passed to a linear classifier to predict the timestamp. In contrast, the mixture-of-TPOUR (lower) uses a set of frozen retrievers {π t1 θ , . . … view at source ↗
Figure 8
Figure 8. Figure 8: Normalized count of retrieved documents per year (X-axis) given the test set year (Y-axis) on SituatedQA, with queries containing explicit (Explicit) or implicit (Implicit) temporal information, when interpolated between 2018 (α = 0.0) and 2021 (α = 1.0). 24 [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Normalized count of retrieved documents per year (X-axis) given the test set year (Y-axis) on RealTimeQA, with queries containing explicit (Explicit) or implicit (Implicit) temporal information, when interpolated between January (α = 0.0) and December (α = 1.0). 25 [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Ablation of λ, the interpolation ratio between LTRPO (λ = 0.0) LCE (λ = 1.0), for TPOUR Contriever 2018 and 2021, evaluated on SituatedQA 2018 and 2021 respectively. Performance improves significantly with moderate λ values, showing that combining semantic and temporal supervision is more effective than relying solely on either. Dashed lines at λ = 1.0 indicate performance using contrastive-only training.… view at source ↗
read the original abstract

Unsupervised dense retrievers offer scalability by learning semantic similarity from unlabeled documents via contrastive learning, but they struggle to capture the temporal relevance, retrieving semantically related but temporally misaligned documents-an important aspect when a document collection spans multiple time periods (e.g., retrieving documents from 2018-2025 for "Who is the president in 2019?" introduces temporal ambiguity). Existing methods rely on supervised training with explicit timestamps, which are not always feasible. We propose TPOUR (Temporal Preference Optimization for Unsupervised Retriever), which uses our novel training method Temporal Retrieval Preference Optimization (TRPO). TRPO reinterprets preference learning in the temporal dimension, guiding the retriever to favor temporally aligned documents. TPOUR further generalizes to unseen time periods via interpolation in a learned time embedding, enabling continuous temporal alignment. Experiments on temporal information retrieval (T-IR), TPOUR outperforms both unsupervised and supervised baselines. Compared to Qwen-Embedding-8B, despite being about 72.7x smaller, TPOUR Contriever improves average nDCG@5 by +4.04 (+12.15%) on explicit and +4.98 (+15.21%) on implicit queries. We provide our code at https://github.com/agwaBom/TPOUR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces TPOUR, a method for unsupervised dense retrieval that employs Temporal Retrieval Preference Optimization (TRPO) to reinterpret preference learning along the temporal dimension, thereby guiding retrievers to favor temporally aligned documents from unlabeled data. TPOUR incorporates a learned time embedding to enable interpolation and generalization to unseen time periods. On temporal information retrieval (T-IR) tasks, the method is reported to outperform both unsupervised and supervised baselines, with TPOUR Contriever achieving average nDCG@5 gains of +4.04 (+12.15%) on explicit queries and +4.98 (+15.21%) on implicit queries relative to the much larger Qwen-Embedding-8B model.

Significance. If the central claims hold after verification, the work would be significant for advancing unsupervised retrieval in temporally dynamic collections without requiring explicit timestamps or supervision, addressing a practical limitation of standard contrastive approaches. The code release at the provided GitHub link is a positive factor for reproducibility.

major comments (3)
  1. [Abstract] Abstract: the description of TRPO states that it 'reinterprets preference learning in the temporal dimension' and 'generalizes to unseen time periods via interpolation' but supplies no derivation, pair-construction procedure, or example demonstrating how contrastive pairs encode temporal order (rather than semantic similarity) from document semantics alone.
  2. [Abstract] Abstract / Experiments section: the headline outperformance claims (+4.04 and +4.98 nDCG@5) are presented without reported variance, number of runs, statistical tests, or controls for potential metadata correlations in the unlabeled data, which is load-bearing for the assertion that the method remains unsupervised while surpassing supervised baselines.
  3. [Abstract] Abstract: no ablation or held-out-period validation is described to test the assumption that interpolation in the learned time embedding (rather than semantic cues alone) drives the reported gains on implicit queries, leaving the generalization claim unverified.
minor comments (2)
  1. [Abstract] The abstract refers to 'T-IR' tasks and 'explicit' vs. 'implicit' queries without defining these terms or citing the specific datasets and query construction process used in the experiments.
  2. The manuscript states that code is provided at https://github.com/agwaBom/TPOUR but does not specify the contents (e.g., training scripts, hyper-parameters, or evaluation code) in the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and note the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the description of TRPO states that it 'reinterprets preference learning in the temporal dimension' and 'generalizes to unseen time periods via interpolation' but supplies no derivation, pair-construction procedure, or example demonstrating how contrastive pairs encode temporal order (rather than semantic similarity) from document semantics alone.

    Authors: The abstract summarizes the approach at a high level. Section 3 of the manuscript provides the full derivation of TRPO, the pair-construction procedure that infers temporal order from document semantics in unlabeled data, and concrete examples. We will add a brief illustrative example to the abstract to clarify how contrastive pairs capture temporal alignment rather than pure semantic similarity. revision: yes

  2. Referee: [Abstract] Abstract / Experiments section: the headline outperformance claims (+4.04 and +4.98 nDCG@5) are presented without reported variance, number of runs, statistical tests, or controls for potential metadata correlations in the unlabeled data, which is load-bearing for the assertion that the method remains unsupervised while surpassing supervised baselines.

    Authors: We agree these details strengthen the claims. We will revise the experiments section to report results across multiple runs with different seeds, include standard deviations and statistical significance tests. We will also add an analysis checking for potential metadata correlations in the unlabeled data to support the unsupervised nature of the gains. revision: yes

  3. Referee: [Abstract] Abstract: no ablation or held-out-period validation is described to test the assumption that interpolation in the learned time embedding (rather than semantic cues alone) drives the reported gains on implicit queries, leaving the generalization claim unverified.

    Authors: Section 4 contains ablations on the time embedding component. We acknowledge that explicit held-out-period validation would better isolate the contribution of interpolation. We will add such an experiment in the revision to verify that the learned time embedding drives gains on implicit queries beyond semantic cues. revision: yes

Circularity Check

0 steps flagged

No circularity: new training procedure presented without reduction to fitted inputs or self-citations

full rationale

The paper introduces TPOUR and its TRPO training method as a novel reinterpretation of preference learning along the temporal dimension for unsupervised retrievers, with generalization via interpolation in a learned time embedding. The provided abstract and description contain no equations, no fitted parameters renamed as predictions, and no load-bearing self-citations or uniqueness theorems. The central claims rest on the proposed method's construction rather than any self-referential reduction, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5760 in / 1024 out tokens · 31685 ms · 2026-06-26T22:50:24.691309+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 14 canonical work pages · 5 internal anchors

  1. [1]

    doi: https://doi.org/10.1016/j.knosys.2013.03

  2. [2]

    URL https://www.sciencedirect.com/ science/article/pii/S0950705113001044. Brin, S. and Page, L. The anatomy of a large-scale hypertextual web search engine.Computer Networks and ISDN Systems, 30(1):107–117, 1998. ISSN 0169-7552. doi: https://doi.org/10.1016/S0169-7552(98)00110-X. URL https://www.sciencedirect.com/ science/article/pii/S016975529800110X. Pr...

  3. [3]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.552. URL https:// aclanthology.org/2021.emnlp-main.552. Gao, Y ., Xiong, Y ., Gao, X., Jia, K., Pan, J., Bi, Y ., Dai, Y ., Sun, J., Wang, M., and Wang, H. Retrieval-augmented 10 Temporal Preference Optimization for Unsupervised Retrieval generation for large language models: A su...

  4. [4]

    He, K., Fan, H., Wu, Y ., Xie, S., and Girshick, R

    URL https://proceedings.mlr.press/ v119/guu20a.html. He, K., Fan, H., Wu, Y ., Xie, S., and Girshick, R. Mo- mentum contrast for unsupervised visual representation learning. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735,

  5. [5]

    Kevin Zhou

    doi: 10.1109/CVPR42600.2020.00975. Izacard, G., Caron, M., Hosseini, L., Riedel, S., Bo- janowski, P., Joulin, A., and Grave, E. Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res., 2022, 2022. URL https: //openreview.net/forum?id=jKN1pXi7b0. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D. P., and Wilson, A....

  6. [6]

    Dense passage retrieval for open-domain question answering, in: Webber, B., Cohn, T., He, Y ., Liu, Y

    Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.550. URL https:// aclanthology.org/2020.emnlp-main.550. Kasai, J., Sakaguchi, K., takahashi, y., Le Bras, R., Asai, A., Yu, X., Radev, D., Smith, N. A., Choi, Y ., and Inui, K. Realtime qa: What's the answer right now? In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., ...

  7. [7]

    cc/paper_files/paper/2023/file/ 11 Temporal Preference Optimization for Unsupervised Retrieval 9941624ef7f867a502732b5154d30cb7- Paper-Datasets_and_Benchmarks.pdf

    URL https://proceedings.neurips. cc/paper_files/paper/2023/file/ 11 Temporal Preference Optimization for Unsupervised Retrieval 9941624ef7f867a502732b5154d30cb7- Paper-Datasets_and_Benchmarks.pdf. Kwon, M., Bang, J., Hwang, S., Jang, J., and Lee, W. A dynamic-selection-based, retrieval-augmented genera- tion framework: Enhancing multi-document question- a...

  8. [8]

    cc/paper_files/paper/2021/file/ f5bf0ba0a17ef18f9607774722f5698c- Paper.pdf

    URL https://proceedings.neurips. cc/paper_files/paper/2021/file/ f5bf0ba0a17ef18f9607774722f5698c- Paper.pdf. Lee, K., Chang, M.-W., and Toutanova, K. Latent retrieval for weakly supervised open domain question answering. In Korhonen, A., Traum, D., and M `arquez, L. (eds.), Proceedings of the 57th Annual Meeting of the Asso- ciation for Computational Lin...

  9. [9]

    Towards General Text Embeddings with Multi-stage Contrastive Learning

    URL https://proceedings.neurips. cc/paper_files/paper/2020/file/ 6b493230205f780e1bc26945df7481e5- Paper.pdf. Li, X., Jin, J., Zhou, Y ., Zhang, Y ., Zhang, P., Zhu, Y ., and Dou, Z. From matching to generation: A survey on generative information retrieval.ACM Trans. Inf. Syst., March 2025. ISSN 1046-8188. doi: 10.1145/3722552. URLhttps://doi.org/10.1145/...

  10. [10]

    Text and Code Embeddings by Contrastive Pre-Training

    URL https://aclanthology.org/2023. emnlp-main.322/. Neelakantan, A., Xu, T., Puri, R., Radford, A., Han, J. M., Tworek, J., Yuan, Q., Tezak, N., Kim, J. W., Hallacy, C., Heidecke, J., Shyam, P., Power, B., Nekoul, T. E., Sastry, G., Krueger, G., Schnurr, D., Such, F. P., Hsu, K., Thompson, M., Khan, T., Sherbakov, T., Jang, J., Welinder, P., and Weng, L. ...

  11. [11]

    cc/paper_files/paper/2022/file/ b1efde53be364a73914f58805a001731- Paper-Conference.pdf

    URL https://proceedings.neurips. cc/paper_files/paper/2022/file/ b1efde53be364a73914f58805a001731- Paper-Conference.pdf. Qian, X., Zhang, Y ., Zhao, Y ., Zhou, B., Sui, X., Zhang, L., and Song, K. TimeR 4 : Time-aware retrieval- augmented large language models for temporal knowl- edge graph question answering. In Al-Onaizan, Y ., Bansal, M., and Chen, Y ....

  12. [12]

    cc/paper_files/paper/2023/file/ a85b405ed65c6477a4fe8302b5e06ce7- Paper-Conference.pdf

    URL https://proceedings.neurips. cc/paper_files/paper/2023/file/ a85b405ed65c6477a4fe8302b5e06ce7- Paper-Conference.pdf. Rame, A., Ahuja, K., Zhang, J., Cord, M., Bottou, L., and Lopez-Paz, D. Model ratatouille: Recycling di- verse models for out-of-distribution generalization. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlet...

  13. [13]

    Robertson, S

    URL https://proceedings.mlr.press/ v202/rame23a.html. Robertson, S. and Zaragoza, H. The probabilistic relevance framework: Bm25 and beyond.Found. Trends Inf. Retr., 3(4):333–389, apr 2009. ISSN 1554-0669. doi: 10.1561/ 1500000019. URL https://doi.org/10.1561/ 1500000019. Rosin, G. D., Guy, I., and Radinsky, K. Time mask- ing for temporal language models....

  14. [14]

    Thakur, N., Reimers, N., R ¨uckl´e, A., Srivastava, A., and Gurevych, I

    Accessed: 2026-01-27. Thakur, N., Reimers, N., R ¨uckl´e, A., Srivastava, A., and Gurevych, I. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Process- ing Systems Datasets and Benchmarks Track (Round 2),

  15. [15]

    Wang, C., Jiang, Y ., Yang, C., Liu, H., and Chen, Y

    URL https://openreview.net/forum? id=wCu6T5xFjeJ. Wang, C., Jiang, Y ., Yang, C., Liu, H., and Chen, Y . Beyond reverse KL: generalizing direct preference optimization with diverse divergence constraints. InThe Twelfth Inter- national Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024a. URL https://ope...

  16. [16]

    URL https://proceedings.mlr.press/ v235/xiong24a.html. Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L...

  17. [17]

    Bowman and George Dahl

    URL https://www.sciencedirect.com/ science/article/pii/S2666651022000249. 15 Temporal Preference Optimization for Unsupervised Retrieval Zhang, M. and Choi, E. SituatedQA: Incorporating extra- linguistic contexts into QA. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.),Proceedings of the 2021 Conference on Empirical Methods in Natural Lan...

  18. [18]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    doi: 10.1145/3285029. URL https://doi. org/10.1145/3285029. Zhang, Y ., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., and Zhou, J. Qwen3 embedding: Advancing text embedding and reranking through foundation models, 2025. URL https://arxiv.org/abs/2506.05176. Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y .,...

  19. [19]

    If the actual document update with temporal change is too small relative to noise, TRPO learning could be unstable

    Temporal preference margin.There must be a certain temporal preference gap (i.e., margin) between aligned and misaligned document E[S(Q, Dt)−S(Q, D t′ )]> δ when t′ ̸=t where δ is a minimum gap required. If the actual document update with temporal change is too small relative to noise, TRPO learning could be unstable. To handle this issue, we comprise a t...

  20. [20]

    Similar semantic across corpora.Aligned and misaligned temporal corpora should cover a similar set of topics, so semantic similarity may remain high and the only difference is the timestamp and the document content at that timestamp

  21. [21]

    generation quality

    Model capacity.Encoder should have sufficient capacity to represent latent temporal signal as well as semantic similarity. Under these conditions, TRPO encourages the model to rank temporally aligned documents higher. The resulting scoring function Sθ is expected to approximate one that reflects temporal alignment between query and document. This mirrors ...

  22. [22]

    They also introduced the concept of document focus time, which refers to the temporal period indicated by the document content and is distinct from its creation time

    proposed a re-ranking method that utilizes archived web snapshots to prioritize documents based on content freshness and relevance. They also introduced the concept of document focus time, which refers to the temporal period indicated by the document content and is distinct from its creation time. Additionally, they proposed a method to automatically esti...

  23. [23]

    They further proposed the first machine learning framework capable of automatically selecting the most effective temporal ranking strategy for a given query (Kanhabua et al., 2012)

    developed methods for determining the time of implicit temporal queries by leveraging temporal language models trained on timestamped corpora. They further proposed the first machine learning framework capable of automatically selecting the most effective temporal ranking strategy for a given query (Kanhabua et al., 2012). C.2. Baseline Models DPR (Dense ...

  24. [24]

    Identify the temporal intent of the query

  25. [25]

    Filter or downweight documents that violate the temporal constraint

  26. [26]

    Rank documents by both semantic relevance and temporal alignment

  27. [27]

    last week

    Prefer documents whose timestamps are closest to, but not exceeding, the target time. Query:{QUERY} 21 Temporal Preference Optimization for Unsupervised Retrieval D. Additional Experimental Results & Analysis D.1. Full Results on BEIR Benchmark Table 11.Retrieval performance (nDCG@10) on the BEIR benchmark, with dataset publication years shown below each ...