pith. machine review for the scientific record. sign in

arxiv: 2004.04906 · v3 · submitted 2020-04-10 · 💻 cs.CL

Recognition: 1 theorem link

Dense Passage Retrieval for Open-Domain Question Answering

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:21 UTC · model grok-4.3

classification 💻 cs.CL
keywords dense retrievalopen domain QAdual encoderpassage retrievalBM25question answeringneural embeddingsretrieval accuracy
0
0 comments X

The pith

Dense vector embeddings from a dual-encoder model outperform BM25 by 9-19 percent in top-20 passage retrieval for open-domain question answering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that dense representations can handle passage retrieval for open-domain question answering without relying on sparse vector models like TF-IDF or BM25. Embeddings are trained using a dual-encoder setup on limited question-passage pairs, yet they deliver 9 to 19 percent better top-20 retrieval accuracy across datasets. The improved retrieval directly boosts end-to-end QA performance, achieving new state-of-the-art results on several benchmarks. Readers should care because this simplifies and strengthens the retrieval step that underpins scalable question answering over large text collections.

Core claim

Open-domain question answering relies on efficient passage retrieval, traditionally done with sparse models such as TF-IDF or BM25. We demonstrate that retrieval can instead be implemented using dense representations alone. These embeddings are learned from a small number of questions and passages using a simple dual-encoder framework. When tested on multiple open-domain QA datasets, the dense retriever outperforms a strong Lucene-BM25 system by 9 to 19 percent absolute in top-20 passage retrieval accuracy. This retrieval improvement allows our end-to-end QA system to reach new state-of-the-art performance on the benchmarks.

What carries the argument

A dual-encoder model that independently embeds questions and passages into a shared dense vector space for similarity-based retrieval.

If this is right

  • Higher top-20 retrieval accuracy leads to more relevant contexts being available for the reader module in QA systems.
  • The method can be integrated into existing QA pipelines to boost overall accuracy.
  • It establishes new performance records on multiple standard open-domain QA benchmarks.
  • Dense retrieval becomes a viable practical alternative to sparse indexing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Neural dense retrieval may reduce dependence on exact term overlap, capturing semantic matches instead.
  • The approach could be extended by combining dense and sparse signals for hybrid retrieval.
  • Generalization from small training sets implies that the model learns robust semantic features applicable to unseen queries.

Load-bearing premise

Embeddings trained on a limited set of questions and passages will generalize well to the broader range of queries and documents seen during testing.

What would settle it

Observing no improvement or a decrease in top-20 passage retrieval accuracy for the dense model compared to BM25 on a standard open-domain QA test set would falsify the performance claim.

read the original abstract

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a dense passage retrieval method for open-domain QA based on a dual-encoder framework that learns embeddings from a modest number of question-passage pairs. It reports that the resulting retriever outperforms a strong Lucene-BM25 baseline by 9-19% absolute top-20 accuracy across several QA datasets and, when plugged into an end-to-end reader, yields new state-of-the-art results on multiple open-domain QA benchmarks.

Significance. If the empirical gains hold, the work provides a practical demonstration that supervised dense retrieval can substantially surpass classical sparse methods without requiring hand-crafted features or inverted indexes, thereby shifting the default retrieval component in open-domain QA pipelines toward learned embeddings.

major comments (1)
  1. [Section 3 (Training) and experimental setup] The training procedure (negative sampling strategy and construction of the training set) is not ablated; without these controls it remains possible that the reported 9-19% gains partly reflect dataset-specific selection effects rather than the dual-encoder architecture itself.
minor comments (2)
  1. [Abstract] The abstract states gains of '9%-19%' but does not report per-dataset numbers, standard deviations, or confidence intervals; a table with these statistics would make the strength of the improvement clearer.
  2. [Section 2] Notation for the dual-encoder scoring function and the contrastive loss should be introduced once in a single equation block rather than scattered across prose.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive summary and recommendation for minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: [Section 3 (Training) and experimental setup] The training procedure (negative sampling strategy and construction of the training set) is not ablated; without these controls it remains possible that the reported 9-19% gains partly reflect dataset-specific selection effects rather than the dual-encoder architecture itself.

    Authors: We agree that an explicit ablation of negative sampling and training-set construction would strengthen the claims. Our main experiments compare the trained dual-encoder against a strong unsupervised BM25 baseline on the same corpora, which already isolates the benefit of learned dense representations. Nevertheless, to directly address the concern, we will add a new ablation subsection in the revised manuscript that reports retrieval accuracy when training with (i) random negatives, (ii) BM25-retrieved hard negatives, and (iii) varying numbers of negatives per question. These additional controls will clarify how much of the observed 9-19% improvement is attributable to the dual-encoder architecture versus the particular negative-sampling procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper trains a dual-encoder model via standard contrastive loss on QA pairs to produce dense passage embeddings, then measures top-k retrieval accuracy on held-out test portions of standard benchmarks (Natural Questions, TriviaQA, etc.). No equation or claim reduces the reported 9-19% gains to a fitted parameter by construction, nor does any load-bearing step rely on a self-citation chain that is itself unverified. The evaluation is ordinary supervised held-out testing; the derivation chain (indexing, retrieval, end-to-end QA) remains externally falsifiable and does not collapse into its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that cosine similarity in a learned embedding space captures relevance, plus the usual supervised contrastive training assumptions. No new entities are postulated.

axioms (1)
  • domain assumption Embeddings learned via contrastive loss on QA pairs will place relevant passages near their questions in vector space
    Invoked in the dual-encoder training description in the abstract

pith-pipeline@v0.9.0 · 5432 in / 1185 out tokens · 32529 ms · 2026-05-15T21:21:33.779649+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BOOKMARKS: Efficient Active Storyline Memory for Role-playing

    cs.CL 2026-05 unverdicted novelty 7.0

    BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

  2. Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems

    cs.IR 2026-04 unverdicted novelty 7.0

    Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.

  3. Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

    cs.CL 2024-04 conditional novelty 7.0

    A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.

  4. C-Pack: Packed Resources For General Chinese Embeddings

    cs.CL 2023-09 accept novelty 7.0

    C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.

  5. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    cs.CL 2020-05 accept novelty 7.0

    RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.

  6. Task-Adaptive Embedding Refinement via Test-time LLM Guidance

    cs.CL 2026-05 unverdicted novelty 6.0

    Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.

  7. From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

    cs.AI 2026-04 unverdicted novelty 6.0

    Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.

  8. EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval

    cs.AI 2026-04 unverdicted novelty 6.0

    EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming ...

  9. Knowledge Is Not Static: Order-Aware Hypergraph RAG for Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    OKH-RAG represents knowledge as ordered hyperedges and retrieves coherent interaction sequences via a learned transition model, outperforming permutation-invariant RAG baselines on order-sensitive QA tasks.

  10. NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

    cs.CL 2024-05 accept novelty 6.0

    NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.

  11. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

    cs.CL 2023-10 unverdicted novelty 6.0

    Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.

  12. MemGPT: Towards LLMs as Operating Systems

    cs.AI 2023-10 unverdicted novelty 6.0

    MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.

  13. LaMDA: Language Models for Dialog Applications

    cs.CL 2022-01 unverdicted novelty 6.0

    LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

  14. Unsupervised Dense Information Retrieval with Contrastive Learning

    cs.IR 2021-12 unverdicted novelty 6.0

    Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.

  15. How Much Knowledge Can You Pack Into the Parameters of a Language Model?

    cs.CL 2020-02 accept novelty 6.0

    Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.

  16. Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks

    cs.SE 2026-05 unverdicted novelty 5.0

    Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.

  17. Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research

    cs.HC 2026-04 unverdicted novelty 5.0

    AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.

  18. Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering

    cs.CL 2026-04 unverdicted novelty 4.0

    Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.

  19. Unified Supervision for Walmart's Sponsored Search Retrieval via Joint Semantic Relevance and Behavioral Engagement Modeling

    cs.IR 2026-04 unverdicted novelty 4.0

    A hybrid supervision method for bi-encoder retrievers combines graded relevance from teacher models, production retrieval priors, and selective engagement to improve relevance and NDCG over Walmart's current sponsored...

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages · cited by 19 Pith papers · 7 internal anchors

  1. [1]

    Passage Re-ranking with

    Nogueira, Rodrigo and Cho, Kyunghyun , journal=. Passage Re-ranking with

  2. [2]

    2020 , booktitle =

    Khattab, Omar and Zaharia, Matei , title =. 2020 , booktitle =

  3. [3]

    Relevance-guided Supervision for OpenQA with

    Khattab, Omar and Potts, Christopher and Zaharia, Matei , journal=. Relevance-guided Supervision for OpenQA with

  4. [4]

    The probabilistic relevance framework:

    Robertson, Stephen and Zaragoza, Hugo , journal=. The probabilistic relevance framework:

  5. [5]

    Learning to Retrieve Reasoning Paths over

    Asai, Akari and Hashimoto, Kazuma and Hajishirzi, Hannaneh and Socher, Richard and Xiong, Caiming , booktitle=iclr, year=. Learning to Retrieve Reasoning Paths over

  6. [6]

    A Discrete Hard

    Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke , booktitle=emnlp, year=. A Discrete Hard

  7. [7]

    2019 , journal=

    Min, Sewon and Chen, Danqi and Zettlemoyer, Luke and Hajishirzi, Hannaneh , title=. 2019 , journal=

  8. [8]

    End-to-End Open-Domain Question Answering with BERTserini , author=

  9. [9]

    Chen, Danqi and Fisch, Adam and Weston, Jason and Bordes, Antoine , booktitle=acl, pages=. Reading

  10. [10]

    Revealing the Importance of Semantic Retrieval for Machine Reading at Scale , author =

  11. [11]

    Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy , booktitle=emnlp, pages=

  12. [12]

    Introduction to ``

    Ferrucci, David A , journal=. Introduction to ``. 2012 , publisher=

  13. [13]

    ACM Transactions on Information Systems (TOIS) , volume=

    Performance issues and error analysis in an open-domain question answering system , author=. ACM Transactions on Information Systems (TOIS) , volume=. 2003 , publisher=

  14. [14]

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle=naacl, year=

  15. [15]

    Multi-passage

    Wang, Zhiguo and Ng, Patrick and Ma, Xiaofei and Nallapati, Ramesh and Xiang, Bing , booktitle=emnlp, year=. Multi-passage

  16. [16]

    Billion-scale similarity search with

    Johnson, Jeff and Douze, Matthijs and J. Billion-scale similarity search with. ArXiv , volume=

  17. [17]

    Latent Retrieval for Weakly Supervised Open Domain Question Answering , author =

  18. [18]

    Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , author =

  19. [19]

    Voorhees, Ellen M , booktitle=. The

  20. [20]

    Natural Questions: a Benchmark for Question Answering Research , author =

  21. [21]

    Semantic parsing on

    Berant, Jonathan and Chou, Andrew and Frostig, Roy and Liang, Percy , booktitle=emnlp, year=. Semantic parsing on

  22. [22]

    International Conference of the Cross-Language Evaluation Forum for European Languages , pages=

    Modeling of the question answering task in the yodaqa system , author=. International Conference of the Cross-Language Evaluation Forum for European Languages , pages=. 2015 , organization=

  23. [23]

    T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

    Joshi, Mandar and Choi, Eunsol and Weld, Daniel and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. 2017

  24. [24]

    Communications of the ACM , volume=

    Natural language question-answering systems: 1969 , author=. Communications of the ACM , volume=. 1970 , publisher=

  25. [25]

    and Wolf, Alice K

    Green,Jr., Bert F. and Wolf, Alice K. and Chomsky, Carol and Laughery, Kenneth , title =. Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference , series =. 1961 , location =

  26. [26]

    Proceedings of the June 4-8, 1973, national computer conference and exposition , pages=

    Progress in natural language understanding: an application to lunar geology , author=. Proceedings of the June 4-8, 1973, national computer conference and exposition , pages=. 1973 , organization=

  27. [27]

    Empirical Methods in Natural Language Processing (EMNLP) , pages=

    An analysis of the AskMSR question-answering system , author=. Empirical Methods in Natural Language Processing (EMNLP) , pages=

  28. [28]

    , title =

    Kwok, Cody and Etzioni, Oren and Weld, Daniel S. , title =. ACM Trans. Inf. Syst. , issue_date =. 2001 , issn =

  29. [29]

    Proceedings of the 19th international conference on Computational linguistics-Volume 1 , pages=

    Learning question classifiers , author=. Proceedings of the 19th international conference on Computational linguistics-Volume 1 , pages=. 2002 , organization=

  30. [30]

    Logic Form Transformation of W ord N et and its Applicability to Question Answering

    Moldovan, Dan and Rus, Vasile. Logic Form Transformation of W ord N et and its Applicability to Question Answering. 2001

  31. [31]

    Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval , series =

    Tellex, Stefanie and Katz, Boris and Lin, Jimmy and Fernandes, Aaron and Marton, Gregory , title =. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval , series =. 2003 , isbn =. doi:10.1145/860435.860445 , acmid =

  32. [32]

    and Renshaw, Erin

    Richardson, Matthew and Burges, Christopher J.C. and Renshaw, Erin. MCT est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. 2013

  33. [33]

    2019 , volume=

    Zhen-Zhong Lan and Mingda Chen and Sebastian Goodman and Kevin Gimpel and Piyush Sharma and Radu Soricut , journal=. 2019 , volume=

  34. [34]

    2019 , volume=

    Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , journal=. 2019 , volume=

  35. [35]

    Journal of the American society for information science , volume=

    Indexing by latent semantic analysis , author=. Journal of the American society for information science , volume=. 1990 , publisher=

  36. [36]

    Signature verification using a ``

    Bromley, Jane and Guyon, Isabelle and LeCun, Yann and S. Signature verification using a ``. NIPS , pages=

  37. [37]

    Learning a similarity metric discriminatively, with application to face verification , author=

  38. [38]

    Learning discriminative projections for text similarity measures , author=

  39. [39]

    Learning deep structured semantic models for

    Huang, Po-Sen and He, Xiaodong and Gao, Jianfeng and Deng, Li and Acero, Alex and Heck, Larry , booktitle=cikm, pages=. Learning deep structured semantic models for

  40. [40]

    Learning Dense Representations for Entity Retrieval

    Gillick, Daniel and Kulkarni, Sayali and Lansing, Larry and Presta, Alessandro and Baldridge, Jason and Ie, Eugene and Garcia-Olano, Diego. Learning Dense Representations for Entity Retrieval. 2019

  41. [41]

    Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , author=

  42. [42]

    ArXiv , volume=

    Efficient natural language response suggestion for smart reply , author=. ArXiv , volume=

  43. [43]

    ArXiv , year=

    End-to-End Retrieval in Continuous Space , author=. ArXiv , year=

  44. [44]

    Ahmad, Amin and Constant, Noah and Yang, Yinfei and Cer, Daniel , journal=

  45. [45]

    Empirical Methods in Natural Language Processing (EMNLP) , month =

    Yih, Wen-tau , title =. Empirical Methods in Natural Language Processing (EMNLP) , month =. 2009 , address =

  46. [46]

    Contextualized Sparse Representation with Rectified

    Lee, Jinhyuk and Seo, Minjoon and Hajishirzi, Hannaneh and Kang, Jaewoo , journal=. Contextualized Sparse Representation with Rectified

  47. [47]

    Foundations and Trends in Machine Learning , volume=

    Metric learning: A survey , author=. Foundations and Trends in Machine Learning , volume=. 2013 , publisher=

  48. [48]

    Denoising distantly supervised open-domain question answering , author=

  49. [49]

    Wang, Shuohang and Yu, Mo and Guo, Xiaoxiao and Wang, Zhiguo and Klinger, Tim and Zhang, Wei and Chang, Shiyu and Tesauro, Gerry and Zhou, Bowen and Jiang, Jing , booktitle=AAAI, year=. R\^

  50. [50]

    ArXiv , volume=

    How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. ArXiv , volume=

  51. [51]

    Guu, Kelvin and Lee, Kenton and Tung, Zora and Pasupat, Panupong and Chang, Ming-Wei , journal=

  52. [52]

    Learning and inference via maximum inner product search , author=

  53. [53]

    Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

    Maximum inner-product search using cone trees , author=. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

  54. [54]

    Break It Down: A Question Understanding Benchmark , author=

  55. [55]

    Asymmetric

    Shrivastava, Anshumali and Li, Ping , booktitle = nips, editor =. Asymmetric

  56. [56]

    Artificial Intelligence and Statistics , pages=

    Quantization based fast inner product search , author=. Artificial Intelligence and Statistics , pages=

  57. [57]

    Proceedings of the 22nd international conference on Machine learning , pages=

    Learning to rank using gradient descent , author=. Proceedings of the 22nd international conference on Machine learning , pages=

  58. [58]

    ArXiv , year=

    Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering , author=. ArXiv , year=

  59. [59]

    Multi-step retriever-reader interaction for scalable open-domain question answering , author=

  60. [60]

    ArXiv , year=

    Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , author=. ArXiv , year=

  61. [61]

    ArXiv , month=

    Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering , author=. ArXiv , month=. 2019 , volume=

  62. [62]

    Retrieval-augmented generation for knowledge-intensive

    Lewis, Patrick and Perez, Ethan and Piktus, Aleksandara and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Retrieval-augmented generation for knowledge-intensive

  63. [63]

    BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. 2020

  64. [64]

    ArXiv , month=

    Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , author=. ArXiv , month=. 2020 , volume=

  65. [65]

    ArXiv , year=

    Exploring the limits of transfer learning with a unified text-to-text transformer , author=. ArXiv , year=

  66. [66]

    Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2020. Learning to retrieve reasoning paths over Wikipedia graph for question answering. In International Conference on Learning Representations (ICLR)

  67. [67]

    Petr Baudi s and Jan S ediv \`y . 2015. Modeling of the question answering task in the yodaqa system. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 222--228. Springer

  68. [68]

    Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Empirical Methods in Natural Language Processing (EMNLP)

  69. [69]

    a ckinger, and Roopak Shah. 1994. Signature verification using a `` Siamese

    Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S \"a ckinger, and Roopak Shah. 1994. Signature verification using a `` Siamese " time delay neural network. In NIPS, pages 737--744

  70. [70]

    Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89--96

  71. [71]

    Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In Association for Computational Linguistics (ACL), pages 1870--1879

  72. [72]

    Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, and Andrew McCallum. 2019. Multi-step retriever-reader interaction for scalable open-domain question answering. In International Conference on Learning Representations (ICLR)

  73. [73]

    Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391--407

  74. [74]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT : Pre-training of deep bidirectional transformers for language understanding. In North American Association for Computational Linguistics (NAACL)

  75. [75]

    David A Ferrucci. 2012. Introduction to `` This is Watson ". IBM Journal of Research and Development, 56(3.4):1--1

  76. [76]

    Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense representations for entity retrieval. In Computational Natural Language Learning (CoNLL)

  77. [77]

    Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based fast inner product search. In Artificial Intelligence and Statistics, pages 482--490

  78. [78]

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM : Retrieval-augmented language model pre-training. ArXiv, abs/2002.08909

  79. [79]

    Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, L \'a szl \'o Luk \'a cs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient natural language response suggestion for smart reply. ArXiv, abs/1705.00652

  80. [80]

    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for Web search using clickthrough data. In ACM International Conference on Information and Knowledge Management (CIKM), pages 2333--2338

Showing first 80 references.