Rodriques, and Andrew D

Jakub L’ala, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G · 2023 · arXiv 2312.07559

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.

Self Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

cs.LG · 2026-05-07 · conditional · novelty 7.0

Starling uses LLMs and agents to turn 22.5M PubMed papers into 6.3M nuanced structured records across six tasks with 0.6-7.7% frontier-model rejection rates, lower than error rates on existing curated databases.

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

cs.IR · 2026-04-23 · unverdicted · novelty 7.0

PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.

FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

cs.AI · 2026-04-05 · conditional · novelty 7.0

FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.

RAG over Thinking Traces Can Improve Reasoning Tasks

cs.IR · 2026-05-05 · unverdicted · novelty 6.0

RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.

Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery

eess.SY · 2026-05-06 · unverdicted · novelty 5.0

Experiment-as-Code Labs encodes experiments as declarative configurations that AI agents generate, systems software analyzes and orchestrates, and device APIs execute on physical lab hardware.

Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations

physics.plasm-ph · 2026-04-07 · unverdicted · novelty 5.0

Plasma GraphRAG automates physics-grounded parameter selection for gyrokinetic simulations via a domain-specific knowledge graph and LLMs, reporting over 10% better quality and up to 25% fewer hallucinations than standard RAG.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs cs.IR · 2026-04-23 · unverdicted · none · ref 7
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
RAG over Thinking Traces Can Improve Reasoning Tasks cs.IR · 2026-05-05 · unverdicted · none · ref 60
RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.

Rodriques, and Andrew D

fields

years

verdicts

representative citing papers

citing papers explorer