REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.
hub
Language Models as Knowledge Bases?
26 Pith papers cite this work. Polarity classification is still indexing.
abstract
Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at https://github.com/facebookresearch/LAMA.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
ToGRL learns high-quality graph structures from raw heterogeneous graphs via a two-stage topology extraction process and prompt tuning, outperforming prior methods on five datasets.
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
Theoretical analysis of continual factual knowledge acquisition shows data replay stabilizes pretrained knowledge by shifting convergence dynamics while regularization only slows forgetting, leading to the STOC method for attention-based replay selection.
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
Fine-tuning on new knowledge induces propagating hallucinations in LLMs by weakening attention to key entities, with mitigation via reintroducing known knowledge during later training stages.
AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.
Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
HalluHunter is a knowledge-graph and rule-based NLP framework that iteratively generates single- and multi-hop questions to uncover factual errors in LLMs, triggering errors in up to 55% of cases on nine models while preserving coverage.
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
LLMs form an inner monologue from closed-loop language feedback to improve high-level instruction completion in simulated and real robotic rearrangement and kitchen manipulation tasks.
MRKL is a modular neuro-symbolic architecture that integrates LLMs with external knowledge and discrete reasoning to overcome limitations of pure neural language models.
CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.
Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.
A formalized Minimal Cognitive Grid ranks computational models of analogy and metaphor by alignment with cognitive theories using Functional/Structural Ratio, Generality, and Performance Match dimensions.
CRVA-TGRAG combines parent-document segmentation, ensemble retrieval, and teacher-guided fine-tuning to mitigate knowledge conflicts and improve accuracy in LLM-based CVE vulnerability analysis.
ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.
citing papers explorer
-
ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education
ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.