hub

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti + 2 more · 2019 · Transactions of the Association for Computational Linguistics · DOI 10.1162/tacl_a_00276

33 Pith papers cite this work, alongside 920 external citations. Polarity classification is still indexing.

33 Pith papers citing it

920 external citations · Crossref

open at publisher browse 33 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

SIOP enables turn-level credit assignment in LLM agents via semantic clustering of final answers as latent outcomes, improving performance on reasoning benchmarks without verifiers.

A Parametric Memory Head for Continual Generative Retrieval

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

A product-key parametric memory head with selective sparse updates mitigates catastrophic forgetting in generative retrieval models during sequential addition of new documents.

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

cs.IR · 2026-04-15 · unverdicted · novelty 7.0

A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.

Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents

cs.AI · 2026-04-05 · unverdicted · novelty 7.0

PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.

Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems

cs.IR · 2026-04-01 · unverdicted · novelty 7.0

Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges per turn's normalized IG.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

Prescriptive Scaling Laws for Data Constrained Training

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.

PageGuide: Browser extension to assist users in navigating a webpage and locating information

cs.HC · 2026-04-26 · accept · novelty 6.0

PageGuide grounds LLM answers in webpage DOM elements using visual overlays for find, guide, and hide modes, yielding measurable gains in a 94-user study.

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

cs.CR · 2026-04-22 · unverdicted · novelty 6.0

A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.

Reasoning Structure Matters for Safety Alignment of Reasoning Models

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

Changing the internal reasoning structure of large reasoning models through simple supervised fine-tuning on 1K examples produces strong safety alignment that generalizes across tasks and languages.

Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

CalibAdv calibrates advantages in GRPO by downscaling negative signals from incorrect final answers using intermediate step correctness and rebalancing answer-level advantages, yielding better performance and training stability on multiple models and benchmarks.

KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

KnowledgeBerg benchmark shows open-source LLMs achieve only 5.26-36.88 F1 on universe enumeration and 16-44% accuracy on knowledge-grounded compositional reasoning, with persistent failures in completeness, awareness, and application.

GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression

cs.CL · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

GRC unifies generation, retrieval, and compression in LLMs via meta latent tokens for single-pass execution with modular flexibility.

LensVLM: Selective Context Expansion for Compressed Visual Representation of Text

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

LensVLM trains VLMs to scan compressed rendered text images and selectively expand task-relevant regions, achieving 4.3x compression with near full-text accuracy and outperforming baselines up to 10.1x on text QA benchmarks.

citing papers explorer

Showing 33 of 33 citing papers.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 164
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers cs.LG · 2026-05-06 · unverdicted · none · ref 22
SIOP enables turn-level credit assignment in LLM agents via semantic clustering of final answers as latent outcomes, improving performance on reasoning benchmarks without verifiers.
A Parametric Memory Head for Continual Generative Retrieval cs.IR · 2026-04-25 · unverdicted · none · ref 19
A product-key parametric memory head with selective sparse updates mitigates catastrophic forgetting in generative retrieval models during sequential addition of new documents.
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models cs.LG · 2026-04-22 · unverdicted · none · ref 41
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 83
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing cs.CL · 2026-04-17 · unverdicted · none · ref 12
Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.
A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation cs.IR · 2026-04-15 · unverdicted · none · ref 24
A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents cs.AI · 2026-04-05 · unverdicted · none · ref 15
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems cs.IR · 2026-04-01 · unverdicted · none · ref 21
Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 99
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 110
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping cs.CL · 2026-05-07 · unverdicted · none · ref 32
A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges per turn's normalized IG.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models cs.LG · 2026-05-07 · unverdicted · none · ref 19 · 2 links
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments cs.CL · 2026-05-05 · unverdicted · none · ref 73
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting cs.LG · 2026-05-04 · unverdicted · none · ref 12
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Prescriptive Scaling Laws for Data Constrained Training cs.LG · 2026-05-02 · unverdicted · none · ref 33
A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains cs.IR · 2026-04-30 · unverdicted · none · ref 26
NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.
PageGuide: Browser extension to assist users in navigating a webpage and locating information cs.HC · 2026-04-26 · accept · none · ref 23
PageGuide grounds LLM answers in webpage DOM elements using visual overlays for find, guide, and hide modes, yielding measurable gains in a 94-user study.
Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks cs.CR · 2026-04-22 · unverdicted · none · ref 34
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
Reasoning Structure Matters for Safety Alignment of Reasoning Models cs.AI · 2026-04-21 · unverdicted · none · ref 63
Changing the internal reasoning structure of large reasoning models through simple supervised fine-tuning on 1K examples produces strong safety alignment that generalizes across tasks and languages.
Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search cs.CL · 2026-04-20 · unverdicted · none · ref 18
CalibAdv calibrates advantages in GRPO by downscaling negative signals from incorrect final answers using intermediate step correctness and rebalancing answer-level advantages, yielding better performance and training stability on multiple models and benchmarks.
KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models cs.AI · 2026-04-19 · unverdicted · none · ref 48
KnowledgeBerg benchmark shows open-source LLMs achieve only 5.26-36.88 F1 on universe enumeration and 16-44% accuracy on knowledge-grounded compositional reasoning, with persistent failures in completeness, awareness, and application.
GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression cs.CL · 2026-05-09 · unverdicted · none · ref 41 · 2 links
GRC unifies generation, retrieval, and compression in LLMs via meta latent tokens for single-pass execution with modular flexibility.
LensVLM: Selective Context Expansion for Compressed Visual Representation of Text cs.CV · 2026-05-07 · unverdicted · none · ref 16
LensVLM trains VLMs to scan compressed rendered text images and selectively expand task-relevant regions, achieving 4.3x compression with near full-text accuracy and outperforming baselines up to 10.1x on text QA benchmarks.
From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity cs.LG · 2026-05-01 · unverdicted · none · ref 8 · 2 links
EPGS detects high-confidence factual errors in LLMs by using embedding perturbations to measure gradient sensitivity as a proxy for sharp versus flat minima.
All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG cs.CL · 2026-04-22 · unverdicted · none · ref 37
Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation cs.CL · 2026-04-19 · unverdicted · none · ref 109
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement cs.CR · 2026-04-08 · unverdicted · none · ref 15
RefineRAG achieves 90% attack success on NQ by generating toxic seeds then optimizing them via retriever-in-the-loop word refinement, outperforming prior methods on effectiveness and naturalness.
PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 83
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
Text Embeddings by Weakly-Supervised Contrastive Pre-training cs.CL · 2022-12-07 · unverdicted · none · ref 34
E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.
Gemma: Open Models Based on Gemini Research and Technology cs.CL · 2024-03-13 · accept · none · ref 72
Gemma introduces open 2B and 7B LLMs derived from Gemini technology that beat comparable open models on 11 of 18 text tasks and come with safety assessments.
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval cs.IR · 2026-04-29 · conditional · none · ref 27
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
Gemma 2: Improving Open Language Models at a Practical Size cs.CL · 2024-07-31 · conditional · none · ref 79
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer