Recognition: 3 theorem links
· Lean TheoremRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Pith reviewed 2026-05-10 20:37 UTC · model grok-4.3
The pith
Retrieval-augmented generation models combine a seq2seq generator with a dense Wikipedia retriever to outperform purely parametric models on knowledge-intensive tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RAG models pair a pre-trained parametric seq2seq model with a non-parametric dense vector index of Wikipedia accessed by a pre-trained neural retriever. Two formulations are introduced: RAG-sequence, which conditions the entire output on the same retrieved passages, and RAG-token, which can draw on different passages for each token. After fine-tuning, these models set new state-of-the-art scores on three open-domain QA tasks, surpass both parametric seq2seq models and task-specific retrieve-and-extract systems, and generate more factual language than a parametric-only baseline.
What carries the argument
Retrieval-augmented generation (RAG), which integrates a parametric seq2seq generator with a non-parametric dense retriever over a fixed Wikipedia passage index so that generation is explicitly conditioned on retrieved evidence.
If this is right
- RAG models set the state of the art on three open-domain question answering tasks.
- RAG outperforms both purely parametric seq2seq models and specialized retrieve-and-extract architectures on knowledge-intensive tasks.
- Generated text from RAG models is more specific, diverse, and factually accurate than output from parametric-only seq2seq baselines.
- The architecture supplies an explicit, updatable non-parametric memory that parametric models lack.
Where Pith is reading between the lines
- Swapping or updating the underlying Wikipedia index would allow the model to incorporate new facts without retraining the generator parameters.
- The retrieved passages can be returned alongside each generated answer to provide direct provenance for the output.
- Replacing the Wikipedia index with a domain-specific corpus would extend the same retrieval-plus-generation pattern to specialized knowledge tasks.
Load-bearing premise
The pre-trained dense retriever will reliably surface the exact passages containing the knowledge needed for the task and the generator will use them without ignoring the evidence or hallucinating.
What would settle it
Running RAG on an open-domain QA example whose correct answer appears verbatim in a Wikipedia passage yet the retriever returns unrelated passages and the model still produces the wrong answer.
read the original abstract
Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Retrieval-Augmented Generation (RAG) models that combine a pre-trained parametric seq2seq model (BART) with non-parametric memory in the form of a dense vector index over Wikipedia, accessed via a pre-trained neural retriever (DPR). Two formulations are compared: RAG-Sequence, which conditions generation on the same set of retrieved passages throughout, and RAG-Token, which permits different passages per token via marginalization. The models are fine-tuned on a range of knowledge-intensive NLP tasks and reported to achieve state-of-the-art results on three open-domain QA benchmarks while producing more specific, diverse, and factual outputs than parametric-only seq2seq baselines.
Significance. If the results are robust, the work is significant for establishing a general fine-tuning recipe that augments parametric language models with differentiable access to explicit external memory. This directly mitigates limitations in factual recall, provenance, and knowledge updating for knowledge-intensive tasks, and the empirical outperformance over both pure parametric models and specialized retrieve-and-extract architectures suggests a promising direction for hybrid systems.
major comments (2)
- [Experiments / Results] The central SOTA claim on open-domain QA rests on the pre-trained DPR retriever reliably returning passages that contain the necessary knowledge for the majority of queries, followed by successful integration by the generator without ignoring or hallucinating content. The manuscript should include a quantitative retrieval analysis (e.g., top-k recall of gold-answer passages on the evaluation sets for Natural Questions, TriviaQA, and WebQuestions) to substantiate that the reported gains derive from effective RAG rather than other factors.
- [Abstract and Experiments] No error bars, standard deviations, or statistical significance tests are reported for the QA metrics or generation quality scores. Given that the outperformance over parametric seq2seq and retrieve-and-extract baselines is the primary evidence for the framework's value, the absence of these details leaves the robustness of the central empirical claims difficult to assess.
minor comments (1)
- [Abstract] The abstract refers to evaluation on 'a wide range of knowledge-intensive NLP tasks' without enumerating them; adding a short list (e.g., the specific QA, fact verification, and generation datasets) would improve immediate clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript accordingly to strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [Experiments / Results] The central SOTA claim on open-domain QA rests on the pre-trained DPR retriever reliably returning passages that contain the necessary knowledge for the majority of queries, followed by successful integration by the generator without ignoring or hallucinating content. The manuscript should include a quantitative retrieval analysis (e.g., top-k recall of gold-answer passages on the evaluation sets for Natural Questions, TriviaQA, and WebQuestions) to substantiate that the reported gains derive from effective RAG rather than other factors.
Authors: We agree that a direct quantitative retrieval analysis would better substantiate the source of the gains. In the revised manuscript we have added a new subsection (Section 5.3) reporting top-k recall of passages containing the gold answer on the development sets of Natural Questions, TriviaQA, and WebQuestions. The results show that DPR achieves strong recall (e.g., 85.0% at k=10 for NQ), confirming that relevant knowledge is retrieved for the large majority of queries and that the observed improvements over parametric baselines are attributable to effective retrieval-augmented generation. revision: yes
-
Referee: [Abstract and Experiments] No error bars, standard deviations, or statistical significance tests are reported for the QA metrics or generation quality scores. Given that the outperformance over parametric seq2seq and retrieve-and-extract baselines is the primary evidence for the framework's value, the absence of these details leaves the robustness of the central empirical claims difficult to assess.
Authors: We acknowledge the value of reporting variability. However, the computational cost of fine-tuning and evaluating these large models on multiple random seeds is substantial. In the revised version we have added a paragraph in Section 4.2 explicitly noting this limitation and stating that all reported numbers are from single runs, consistent with contemporaneous work on similarly sized models. We also include results from three seeds for the smaller-scale generation-quality human evaluations to provide some indication of stability. The margins over baselines remain large and consistent across tasks, supporting the robustness of the central claims. revision: partial
Circularity Check
No circularity: empirical SOTA claims rest on external benchmarks and independent baselines
full rationale
The paper introduces RAG as a fine-tuning recipe combining a pre-trained seq2seq generator with a fixed pre-trained dense retriever over Wikipedia. All central claims (SOTA on three open-domain QA tasks, outperforming parametric seq2seq and retrieve-and-extract baselines) are measured via standard held-out evaluation on public datasets (Natural Questions, TriviaQA, etc.) against independently published numbers. No equation or result is defined in terms of a fitted parameter that is then re-predicted, no self-citation chain is load-bearing for the performance numbers, and the marginalization formulations (RAG-Sequence, RAG-Token) are directly implemented and evaluated rather than derived from prior self-work by construction. The pre-trained DPR retriever is an external component whose coverage is tested rather than assumed tautologically.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of retrieved passages k
- standard fine-tuning hyper-parameters
axioms (1)
- domain assumption A pre-trained dense retriever (DPR-style) and a pre-trained seq2seq model (BART-style) can be jointly fine-tuned to produce coherent generation conditioned on retrieved text.
Lean theorems connected to this paper
-
Foundation.PhiForcingphi_equation unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 60 Pith papers
-
TruthfulQA: Measuring How Models Mimic Human Falsehoods
A new benchmark reveals that language models including GPT-3 are truthful on only 58% of questions designed to elicit popular misconceptions, far below human performance of 94%, with larger models performing worse.
-
Language Models are Few-Shot Learners
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
-
MeMo: Memory as a Model
MeMo encodes new knowledge into a separate memory model for frozen LLMs, achieving strong performance on BrowseComp-Plus, NarrativeQA, and MuSiQue while capturing cross-document relationships and remaining robust to r...
-
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations
IfcLLM combines relational and graph representations of IFC models with iterative LLM reasoning to deliver 93.3-100% first-attempt accuracy on natural language queries across three test models.
-
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
-
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
-
Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios
ROME generates deceptive safety benchmarks that degrade LLM agent judgment performance, while ARISE uses analogical retrieval to improve safety decisions at inference time without retraining.
-
Perturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updates
In 30-step recursive LLM loops, append-mode persistent escape from source basins reaches 50% near 400 tokens under full history but plateaus below 50% under tail-clip memory policy, while replace-mode switching largel...
-
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
-
Training Transformers as a Universal Computer
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
-
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
-
Similar Users-Augmented Interest Network
SUIN improves CTR prediction by augmenting target user sequences with similar users' behaviors via embedding-based retrieval, user-specific position encoding, and user-aware target attention.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
Dr.Sai: An agentic AI for real-world physics analysis at BESIII
Dr.Sai autonomously executed full physics analysis pipelines on real BESIII data to re-measure ten J/psi decay branching fractions, matching established benchmarks without any manual coding.
-
Learning When Not to Decide: A Framework for Overcoming Factual Presumptuousness in AI Adjudication
A new structured prompting method (SPEC) helps AI detect insufficient evidence in adjudication tasks and defer decisions appropriately, reaching 89% accuracy on a benchmark varying information completeness from Colora...
-
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning
MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and eva...
-
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
-
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning
IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-h...
-
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
-
IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling
IoT-Brain uses a neuro-symbolic Spatial Trajectory Graph to ground LLMs for verifiable semantic-spatial sensor scheduling, achieving 37.6% higher task success with lower resource use on a campus-scale benchmark.
-
An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks
An agentic architecture with multimodal screening, a five-agent jury, meta-synthesis, and source attribution protocol detects biases in Romanian history textbooks more accurately than zero-shot baselines, achieving 83...
-
SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation
SkillGraph builds a reusable execution-transition graph prior from LLM trajectories and applies it via hybrid retrieval plus learned reranking to raise tool-sequence quality on ToolBench and API-Bank benchmarks.
-
Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception
Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent...
-
BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation
Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.
-
Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems
Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.
-
From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering
Docling with hierarchical splitting reaches 94.1% RAG accuracy on domain documents, beating naive PDF loading but trailing manual Markdown curation at 97.1%.
-
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis
LLM4Log is a systematic review of 145 papers on LLM-based log analysis that delivers a unified taxonomy, design patterns, and open challenges for reliable adoption in AIOps.
-
An Annotation Scheme and Classifier for Personal Facts in Dialogue
An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 ...
-
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
-
Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models
Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing ...
-
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
RLearner-LLM's Hybrid-DPO fuses DeBERTa NLI and LLM verifier scores to deliver up to 6x higher NLI entailment than standard SFT while preserving answer coverage across academic domains.
-
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
RLearner-LLM achieves up to 6x gains in NLI entailment over standard fine-tuning by using an automated hybrid DPO pipeline that balances logic and fluency across multiple model sizes and domains.
-
An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration
Experience-RAG Skill uses experience memory to dynamically select retrieval strategies for agents, achieving 0.8924 nDCG@10 on BeIR/nq, hotpotqa, and scifact while outperforming fixed single-retriever baselines.
-
CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tune...
-
FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning
FT-RAG introduces a fine-grained graph-based retrieval framework for tables plus a new 9870-pair benchmark, reporting 23.5% and 59.2% gains in table- and cell-level hit rates and 62.2% higher exact-value recall over b...
-
Agentic AI for Substance Use Education: Integrating Regulatory and Scientific Knowledge Sources
The authors built and expert-evaluated an agentic AI system integrating DEA regulatory data with dynamic scientific literature via RAG to provide accurate, context-sensitive substance use education, with mean Likert r...
-
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation
STC reduces tabular chunk counts by up to 56% versus baselines and raises hybrid MRR to 0.5945 and BM25 Recall@1 to 0.754 by preserving row structure during chunking.
-
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...
-
MindTrellis: Co-Creating Knowledge Structures with AI through Interactive Visual Exploration
MindTrellis enables users and AI to co-create evolving knowledge graphs, outperforming retrieval-only tools in expert-rated content coverage, structural quality, and reduced cognitive load during a study of 12 partici...
-
ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation
ORPHEAS, a Greek-English embedding model created with knowledge graph fine-tuning, outperforms state-of-the-art multilingual models on monolingual and cross-lingual retrieval benchmarks.
-
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
-
Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriever Evaluation Strategies
CARE, a context-aware LLM judge, outperforms standard methods when evaluating multi-hop retrieval quality in RAG systems.
-
No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation
NWCAD uses a two-stream setup with a two-stage gate to prevent accuracy drops on baseline-correct items under non-informative contexts while retaining gains from helpful contexts.
-
Preregistered Belief Revision Contracts
PBRC is a contract protocol that enforces evidential belief updates in deliberative multi-agent systems and proves it prevents conformity-driven false cascades under conservative fallbacks.
-
Knowledge Is Not Static: Order-Aware Hypergraph RAG for Language Models
OKH-RAG represents knowledge as ordered hyperedges and retrieves coherent interaction sequences via a learned transition model, outperforming permutation-invariant RAG baselines on order-sensitive QA tasks.
-
Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval
A hybrid graph-text retrieval system for cyber threat intelligence improves multi-hop question answering by up to 35% over vector-based RAG on a 3,300-question benchmark.
-
In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach
A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.
-
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.
-
TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving
TEC is a new public dataset of detailed human trial-and-error trajectories and reflections on web tasks, with humans showing substantially higher accuracy than LLMs.
-
DQA: Diagnostic Question Answering for IT Support
DQA maintains persistent diagnostic state and aggregates retrievals at the root-cause level to reach 78.7% success on 150 enterprise IT scenarios versus 41.3% for standard multi-turn RAG while cutting average turns fr...
-
SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics
SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergen...
-
Unsupervised Dense Information Retrieval with Contrastive Learning
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
-
Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
-
Neural Code Translation of Legacy Code: APL to C#
Guided LLM strategies with custom datasets and execution-based verification enable functional APL-to-C# translation across a range of program complexities.
-
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use
A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.
-
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
Hybrid-DPO combining NLI and verifier scores delivers up to 6x NLI improvement over SFT baselines across multiple LLMs and domains while preserving answer coverage and inference speed.
-
Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF
A structured practicum guides readers through the complete modern NLP pipeline with reproducible sessions and new linguistic resources for Tajik and Tatar.
Reference graph
Works this paper leans on
-
[1]
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs], November 2016. URL http: //arxiv.org/abs/1611.09268. arX...
work page internal anchor Pith review arXiv 2016
-
[2]
Modeling of the question answering task in the yodaqa system
Petr Baudiš and Jan Šediv`y. Modeling of the question answering task in the yodaqa system. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 222–228. Springer, 2015. URL https://link.springer.com/chapter/10.1007% 2F978-3-319-24027-5_20
work page 2015
-
[3]
Semantic Parsing on Freebase from Question-Answer Pairs
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/ D13-1160
work page 2013
- [5]
-
[6]
Reading W ikipedia to answer open-domain questions
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1171. URL https://www...
-
[7]
Coarse-to-fine question answering for long documents
Eunsol Choi, Daniel Hewlett, Jakob Uszkoreit, Illia Polosukhin, Alexandre Lacoste, and Jonathan Berant. Coarse-to-fine question answering for long documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 209–220, Vancouver, Canada, July 2017. Association for Computational Linguisti...
-
[8]
Simple and effective multi-paragraph reading comprehension
Christopher Clark and Matt Gardner. Simple and Effective Multi-Paragraph Reading Compre- hension. arXiv:1710.10723 [cs], October 2017. URL http://arxiv.org/abs/1710.10723. arXiv: 1710.10723
-
[9]
BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Con- ference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapol...
-
[10]
Wiz- ard of wikipedia: Knowledge-powered conversational agents
Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. Wiz- ard of wikipedia: Knowledge-powered conversational agents. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1l73iRqKm
work page 2019
-
[11]
Ugur Güney, V olkan Cirik, and Kyunghyun Cho
Matthew Dunn, Levent Sagun, Mike Higgins, V . Ugur Guney, V olkan Cirik, and Kyunghyun Cho. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. arXiv:1704.05179 [cs], April 2017. URL http://arxiv.org/abs/1704.05179. arXiv: 1704.05179
-
[12]
Hierarchical neural story generation
Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. In Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1082. URL https://www.aclweb.org/anthology/ P18-1082
-
[13]
doi:10.18653/v1/P19-1346 , pages =
Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3558–3567, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1346. URL https://www.aclweb.or...
-
[14]
Augmenting transformers with KNN-based composite memory, 2020
Angela Fan, Claire Gardent, Chloe Braud, and Antoine Bordes. Augmenting transformers with KNN-based composite memory, 2020. URL https://openreview.net/forum?id= H1gx1CNKPH
work page 2020
- [16]
-
[17]
A knowledge-grounded neural conversation model
Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen tau Yih, and Michel Galley. A knowledge-grounded neural conversation model. In AAAI Conference on Artificial Intelligence, 2018. URL https://www.aaai.org/ocs/index.php/ AAAI/AAAI18/paper/view/16710
work page 2018
-
[18]
When will AI exceed human performance? evidence from AI experts
Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, and Owain Evans. When will AI exceed human performance? evidence from AI experts. CoRR, abs/1705.08807, 2017. URL http://arxiv.org/abs/1705.08807
-
[19]
Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O.K. Li. Search engine guided neural machine translation. In AAAI Conference on Artificial Intelligence , 2018. URL https: //www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17282
work page 2018
-
[20]
Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O.K. Li. Search engine guided neural machine translation. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 , 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pages 5133–5140. AAAI press, 2018. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 ; Conference date: 02-02-2018 Throug...
work page 2018
-
[21]
Hashimoto, Yonatan Oren, and Percy Liang
Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, and Percy Liang. Generating sentences by editing prototypes. Transactions of the Association for Computational Linguistics, 6:437–450,
-
[22]
URL https://www.aclweb.org/anthology/Q18-1031
doi: 10.1162/tacl_a_00030. URL https://www.aclweb.org/anthology/Q18-1031. 11
-
[23]
doi:10.48550/arXiv.2002.08909 , abstract =
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. REALM: Retrieval-augmented language model pre-training. ArXiv, abs/2002.08909, 2020. URL https: //arxiv.org/abs/2002.08909
-
[24]
A retrieve-and-edit framework for predicting structured outputs
Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S Liang. A retrieve-and-edit framework for predicting structured outputs. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, ed- itors, Advances in Neural Information Processing Systems 31 , pages 10052– 10062. Curran Associates, Inc., 2018. URL http://papers.nip...
work page 2018
-
[25]
Simple and effective retrieve- edit-rerank text generation
Nabil Hossain, Marjan Ghazvininejad, and Luke Zettlemoyer. Simple and effective retrieve- edit-rerank text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2532–2538, Online, July 2020. Association for Computa- tional Linguistics. doi: 10.18653/v1/2020.acl-main.228. URL https://www.aclweb.org/ a...
-
[26]
Billion-scale similarity search with GPUs
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus. arXiv preprint arXiv:1702.08734, 2017. URL https://arxiv.org/abs/1702.08734
work page Pith review arXiv 2017
-
[27]
T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017. Association for Computational Linguistics....
-
[28]
Inferring algorithmic patterns with stack- augmented recurrent nets
Armand Joulin and Tomas Mikolov. Inferring algorithmic patterns with stack- augmented recurrent nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 , NIPS’15, page 190–198, Cam- bridge, MA, USA, 2015. MIT Press. URL https://papers.nips.cc/paper/ 5857-inferring-algorithmic-patterns-with-stack-augmen...
work page 2015
-
[29]
arXiv preprint arXiv:2004.04906 , year=
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020. URL https://arxiv.org/abs/2004.04906
-
[30]
Generaliza- tion through memorization: Nearest neighbor language models
Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generaliza- tion through memorization: Nearest neighbor language models. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HklBjCEKvH
work page 2020
-
[31]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[32]
Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Ken- ton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural Questions: a Benchmark for Ques- tion Answering Research. Tran...
work page 2019
-
[33]
Large memory layers with product keys
Guillaume Lample, Alexandre Sablayrolles, Marc’ Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. Large memory layers with product keys. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural In- formation Processing Systems 32, pages 8548–8559. Curran Associates, Inc., 2019. URL http: //papers....
work page 2019
-
[34]
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association 12 for Computational Linguistics, pages 6086–6096, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1612. URL https://www.aclweb.o...
-
[35]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019. URL https://arxiv.org/abs/1910.13461
work page internal anchor Pith review arXiv 1910
-
[36]
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California, June 2016. Association for C...
-
[37]
Acute-eval: Improved dialogue evaluation with optimized questions and multi-turn comparisons
Margaret Li, Jason Weston, and Stephen Roller. Acute-eval: Improved dialogue evaluation with optimized questions and multi-turn comparisons. ArXiv, abs/1909.03087, 2019. URL https://arxiv.org/abs/1909.03087
-
[38]
Robust neural machine translation with joint textual and phonetic embedding
Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. Robust neural machine translation with joint textual and phonetic embedding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3044–3049, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1291. URL http...
-
[39]
Liu*, Mohammad Saleh*, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer
Peter J. Liu*, Mohammad Saleh*, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. Generating wikipedia by summarizing long sequences. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum? id=Hyg0vbWC-
work page 2018
-
[40]
Yury A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42:824–836, 2016. URL https://arxiv.org/abs/1603.09320
-
[41]
Gary Marcus. The next decade in ai: four steps towards robust artificial intelligence. arXiv preprint arXiv:2002.06177, 2020. URL https://arxiv.org/abs/2002.06177
-
[42]
How decoding strategies affect the verifiability of generated text
Luca Massarelli, Fabio Petroni, Aleksandra Piktus, Myle Ott, Tim Rocktäschel, Vassilis Plachouras, Fabrizio Silvestri, and Sebastian Riedel. How decoding strategies affect the verifiability of generated text. arXiv preprint arXiv:1911.03587 , 2019. URL https: //arxiv.org/abs/1911.03587
-
[43]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. In ICLR, 2018. URL https://openreview.net/forum?id=r1gs9JgRZ
work page 2018
-
[44]
Nikita Moghe, Siddhartha Arora, Suman Banerjee, and Mitesh M. Khapra. Towards exploit- ing background knowledge for building conversation systems. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2322–2332, Brus- sels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v...
-
[45]
Preksha Nema and Mitesh M. Khapra. Towards a better metric for evaluating question generation systems. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3950–3959, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1429. URL https://www.aclweb.org/ anthol...
-
[46]
MS MARCO: A human generated machine reading comprehension dataset
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. MS MARCO: A human generated machine reading comprehension dataset. In Tarek Richard Besold, Antoine Bordes, Artur S. d’Avila Garcez, and Greg Wayne, editors, Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic 13 approaches 20...
work page 2016
-
[47]
Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085, 2019. URL https://arxiv.org/abs/1901.04085
work page internal anchor Pith review arXiv 1901
-
[48]
fairseq: A fast, extensible toolkit for sequence modeling
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota, June 2019. Associ...
-
[49]
Finding generalizable evidence by learning to convince q&a models
Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, and Kyunghyun Cho. Finding generalizable evidence by learning to convince q&a models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages 2402–2411,...
-
[50]
Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong,...
-
[51]
Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. How context affects language models’ factual predictions. In Automated Knowledge Base Construction, 2020. URL https://openreview.net/forum? id=025X0zPfn
work page 2020
-
[52]
Im- proving Language Understanding by Generative Pre-Training, 2018
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Im- proving Language Understanding by Generative Pre-Training, 2018. URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/ language-unsupervised/language_understanding_paper.pdf
work page 2018
-
[53]
Language models are unsupervised multitask learners, 2019
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019. URL https://d4mucfpksywv.cloudfront.net/better-language-models/language_ models_are_unsupervised_multitask_learners.pdf
work page 2019
-
[54]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints, 2019. URL https://arxiv.org/abs/1910.10683
work page internal anchor Pith review arXiv 2019
-
[55]
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? arXiv e-prints, 2020. URL https://arxiv.org/abs/ 2002.08910
work page internal anchor Pith review arXiv 2020
-
[56]
The probabilistic relevance framework: Bm25 and beyond
Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, April 2009. ISSN 1554-0669. doi: 10.1561/ 1500000019. URL https://doi.org/10.1561/1500000019
-
[57]
Emma Strubell, Ananya Ganesh, and Andrew McCallum
Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-V oss, Jeff Wu, Alec Radford, and Jian-Bing Wang. Release strategies and the social impacts of language models. ArXiv, abs/1908.09203, 2019
-
[58]
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. End-to-end memory net- works. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems 28, pages 2440–2448. Curran Associates, Inc., 2015. URL http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf . 14
work page 2015
-
[59]
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana, Jun...
work page internal anchor Pith review doi:10.18653/v1/n18-1074 2018
- [61]
-
[62]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc., 2017...
work page 2017
-
[63]
Diverse beam search for improved description of complex scenes
Ashwin Vijayakumar, Michael Cogswell, Ramprasaath Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. Diverse beam search for improved description of complex scenes. AAAI Conference on Artificial Intelligence, 2018. URL https://www.aaai.org/ocs/index. php/AAAI/AAAI18/paper/view/17329
work page 2018
-
[64]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computationa...
-
[65]
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. SuperGLUE: A Stickier Benchmark for General- Purpose Language Understanding Systems. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing...
work page internal anchor Pith review arXiv 2019
-
[66]
R3: Reinforced ranker-reader for open-domain question answering
Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerry Tesauro, Bowen Zhou, and Jing Jiang. R3: Reinforced ranker-reader for open-domain question answering. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovativ...
work page 2018
-
[67]
Evidence aggregation for answer re- ranking in open-domain question answering
Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. Evidence aggregation for answer re- ranking in open-domain question answering. In ICLR, 2018. URL https://openreview. net/forum?id=rJl3yM-Ab
work page 2018
-
[68]
Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings , 2015. URL http://arxiv.org/abs/1410.3916
work page Pith review arXiv 2015
-
[69]
Retrieve and refine: Improved sequence generation models for dialogue
Jason Weston, Emily Dinan, and Alexander Miller. Retrieve and refine: Improved sequence generation models for dialogue. In Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 87–92, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5713. URL h...
-
[70]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Huggingface’s transformers: St...
work page internal anchor Pith review arXiv 1910
-
[71]
Addressing semantic drift in question generation for semi- supervised question answering
Shiyue Zhang and Mohit Bansal. Addressing semantic drift in question generation for semi- supervised question answering. In Proceedings of the 2019 Conference on Empirical Meth- ods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages 2495–2509, Hong Kong, China, Novem- ber 2019. A...
-
[72]
Wanjun Zhong, Jingjing Xu, Duyu Tang, Zenan Xu, Nan Duan, Ming Zhou, Jiahai Wang, and Jian Yin. Reasoning over semantic-level graph for fact checking. ArXiv, abs/1909.03745, 2019. URL https://arxiv.org/abs/1909.03745. 16 Appendices for Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks A Implementation Details For Open-domain QA we report te...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.