hub Canonical reference

Paperqa: Retrieval-augmented generative agent for scientific research

Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, Andrew D White · 2023 · arXiv 2312.07559

Canonical reference. 83% of citing Pith papers cite this work as background.

20 Pith papers citing it

Background 83% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 5 unclear 1

representative citing papers

Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

cs.IR · 2026-04-23 · unverdicted · novelty 7.0

PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.

FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

cs.AI · 2026-04-05 · conditional · novelty 7.0

FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.

XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration

cs.CL · 2025-05-16 · conditional · novelty 6.0

XtraGPT is a suite of 1.5B-14B parameter open-source LLMs fine-tuned on 140,000 revision pairs from 7,000 top-tier papers to support controllable, context-aware academic paper editing.

Supervising the search process produces reliable and generalizable information-seeking agents

cs.CL · 2025-02-19 · unverdicted · novelty 6.0

Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

cs.AI · 2026-05-30 · unverdicted · novelty 5.0

ForeSci is a temporally controlled benchmark with 500 tasks for assessing LLM agents on forward-looking AI research judgments in four domains using cutoff-aligned knowledge bases.

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

cs.AI · 2026-05-28 · unverdicted · novelty 5.0

Compass is an expert-guided LLM agent framework that extracts 3,751 marine Pb records from 230k papers to build the largest integrated database, achieving 92% accuracy via multi-layered validation.

Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

pArticleMap combines article embeddings, graph-based frontier extraction, and agentic LLMs to map nanomedicine literature and generate hypotheses, achieving 10.8% gold recovery and 61% future-neighborhood rate in retrospective benchmarks.

Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery

eess.SY · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.

Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations

physics.plasm-ph · 2026-04-07 · unverdicted · novelty 5.0

Plasma GraphRAG automates physics-grounded parameter selection for gyrokinetic simulations via a domain-specific knowledge graph and LLMs, reporting over 10% better quality and up to 25% fewer hallucinations than standard RAG.

Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question

cs.CL · 2025-07-25 · unverdicted · novelty 5.0

Question interpretation diversity outperforms model diversity for LLM ensembling on binary QA tasks using majority voting.

In Context Learning and Reasoning for Symbolic Regression with Large Language Models

cs.CL · 2024-10-22 · unverdicted · novelty 5.0

GPT-4 models rediscover Langmuir isotherms and produce fits on Nikuradse pipe-flow data via iterative chain-of-thought prompting with scientific context and external code feedback.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval

cs.CV · 2026-01-21 · unverdicted · novelty 4.0

Large-scale profiling of recent AI literature shows growth in safety, multimodal reasoning, and agent studies alongside stabilization in neural machine translation and graph methods.

Retrieval-Augmented Generation for Large Language Models: A Survey

cs.CL · 2023-12-18 · unverdicted · novelty 3.0

A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 2.0

Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

cs.LG · 2026-05-07 · 2 refs

RAG over Thinking Traces Can Improve Reasoning Tasks

cs.IR · 2026-05-05

citing papers explorer

Showing 20 of 20 citing papers.

Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics cs.AI · 2026-05-09 · unverdicted · none · ref 13
Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.
PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs cs.IR · 2026-04-23 · unverdicted · none · ref 7
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification cs.AI · 2026-04-05 · conditional · none · ref 5
FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation cs.AI · 2026-05-11 · unverdicted · none · ref 23
NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution cs.LG · 2026-05-08 · unverdicted · none · ref 14
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration cs.CL · 2025-05-16 · conditional · none · ref 54
XtraGPT is a suite of 1.5B-14B parameter open-source LLMs fine-tuned on 140,000 revision pairs from 7,000 top-tier papers to support controllable, context-aware academic paper editing.
Supervising the search process produces reliable and generalizable information-seeking agents cs.CL · 2025-02-19 · unverdicted · none · ref 39
Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.
ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment cs.AI · 2026-05-30 · unverdicted · none · ref 5
ForeSci is a temporally controlled benchmark with 500 tasks for assessing LLM agents on forward-looking AI research judgments in four domains using cutoff-aligned knowledge bases.
Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent cs.AI · 2026-05-28 · unverdicted · none · ref 24
Compass is an expert-guided LLM agent framework that extracts 3,751 marine Pb records from 230k papers to build the largest integrated database, achieving 92% accuracy via multi-layered validation.
Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine cs.AI · 2026-05-18 · unverdicted · none · ref 18
pArticleMap combines article embeddings, graph-based frontier extraction, and agentic LLMs to map nanomedicine literature and generate hypotheses, achieving 10.8% gold recovery and 61% future-neighborhood rate in retrospective benchmarks.
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery eess.SY · 2026-05-06 · unverdicted · none · ref 28 · 2 links
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations physics.plasm-ph · 2026-04-07 · unverdicted · none · ref 22
Plasma GraphRAG automates physics-grounded parameter selection for gyrokinetic simulations via a domain-specific knowledge graph and LLMs, reporting over 10% better quality and up to 25% fewer hallucinations than standard RAG.
Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question cs.CL · 2025-07-25 · unverdicted · none · ref 2
Question interpretation diversity outperforms model diversity for LLM ensembling on binary QA tasks using majority voting.
In Context Learning and Reasoning for Symbolic Regression with Large Language Models cs.CL · 2024-10-22 · unverdicted · none · ref 97
GPT-4 models rediscover Langmuir isotherms and produce fits on Nikuradse pipe-flow data via iterative chain-of-thought prompting with scientific context and external code feedback.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 128
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval cs.CV · 2026-01-21 · unverdicted · none · ref 21
Large-scale profiling of recent AI literature shows growth in safety, multimodal reasoning, and agent studies alongside stabilization in neural machine translation and graph methods.
Retrieval-Augmented Generation for Large Language Models: A Survey cs.CL · 2023-12-18 · unverdicted · none · ref 53
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 83
Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.
Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale cs.LG · 2026-05-07 · unreviewed · ref 39 · 2 links
RAG over Thinking Traces Can Improve Reasoning Tasks cs.IR · 2026-05-05 · unreviewed · ref 60

Paperqa: Retrieval-augmented generative agent for scientific research

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer