hub

Weld and Doug Downey and Wen

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D'arcy, David Wadden, Matt Latzke, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke · 2024 · arXiv 2411.14199

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 dataset 1

citation-polarity summary

background 1 unclear 1 use dataset 1

representative citing papers

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

cs.AI · 2025-09-30 · unverdicted · novelty 8.0

CritPt benchmark shows state-of-the-art LLMs reach only 5.7% average accuracy on full-scale unpublished physics research tasks, rising to about 10% with coding tools.

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.

Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.

FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

cs.AI · 2026-04-05 · conditional · novelty 7.0

FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.

In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis

cs.DL · 2025-05-20 · unverdicted · novelty 7.0

A framework for nuanced, time-aware research impact summarization using fine-grained temporal citation intents shows moderate to strong correlation with human judgments on insightfulness.

Extending AI for Research to the Humanities: A Multi-Agent Framework for Evidence-Grounded Scholarship

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

SPIRE is a multi-agent framework drawing on scholarly primitives to perform evidence-grounded humanities scholarship, outperforming Naive LLM, Text RAG, and GraphRAG on a benchmark of classical Chinese and Greco-Roman Latin papers.

Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

cs.AI · 2026-05-29 · unverdicted · novelty 6.0

DecomposeR represents research plans as typed DAGs and uses two-stage planner-then-answerer RL to improve long-form research performance by 5.1-8.0 points over baselines.

Self-Optimizing Multi-Agent Systems for Deep Research

cs.IR · 2026-04-03 · unverdicted · novelty 6.0

Multi-agent deep research systems self-optimize prompts through self-play to match or outperform expert-crafted versions.

Stress Testing Factual Consistency Metrics for Long-Document Summarization

cs.CL · 2025-11-10 · unverdicted · novelty 6.0

Short-form factual consistency metrics produce inconsistent scores on semantically equivalent long-document summaries and lose reliability on information-dense claims.

Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers

cs.HC · 2025-10-01 · unverdicted · novelty 6.0

Attribution gradients consolidate citation evidence and enable incremental unfolding of secondary sources, leading to deeper engagement in a lab study of critical reading tasks for AI answers.

XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration

cs.CL · 2025-05-16 · conditional · novelty 6.0

XtraGPT is a suite of 1.5B-14B parameter open-source LLMs fine-tuned on 140,000 revision pairs from 7,000 top-tier papers to support controllable, context-aware academic paper editing.

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

cs.CL · 2026-06-10 · unverdicted · novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.

TusoAI: Agentic Optimization for Scientific Methods

cs.AI · 2025-09-28 · unverdicted · novelty 5.0

TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

cs.AI · 2026-05-20 · unverdicted · novelty 4.0

SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

cs.CL · 2026-03-05

citing papers explorer

Showing 15 of 15 citing papers.

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark cs.AI · 2025-09-30 · unverdicted · none · ref 11
CritPt benchmark shows state-of-the-art LLMs reach only 5.7% average accuracy on full-scale unpublished physics research tasks, rising to about 10% with coding tools.
Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents cs.AI · 2026-05-22 · unverdicted · none · ref 25
Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.
Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics cs.AI · 2026-05-09 · unverdicted · none · ref 2
Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification cs.AI · 2026-04-05 · conditional · none · ref 1
FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis cs.DL · 2025-05-20 · unverdicted · none · ref 1
A framework for nuanced, time-aware research impact summarization using fine-grained temporal citation intents shows moderate to strong correlation with human judgments on insightfulness.
Extending AI for Research to the Humanities: A Multi-Agent Framework for Evidence-Grounded Scholarship cs.CL · 2026-05-29 · unverdicted · none · ref 1
SPIRE is a multi-agent framework drawing on scholarly primitives to perform evidence-grounded humanities scholarship, outperforming Naive LLM, Text RAG, and GraphRAG on a benchmark of classical Chinese and Greco-Roman Latin papers.
Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward cs.AI · 2026-05-29 · unverdicted · none · ref 2
DecomposeR represents research plans as typed DAGs and uses two-stage planner-then-answerer RL to improve long-form research performance by 5.1-8.0 points over baselines.
Self-Optimizing Multi-Agent Systems for Deep Research cs.IR · 2026-04-03 · unverdicted · none · ref 2
Multi-agent deep research systems self-optimize prompts through self-play to match or outperform expert-crafted versions.
Stress Testing Factual Consistency Metrics for Long-Document Summarization cs.CL · 2025-11-10 · unverdicted · none · ref 2
Short-form factual consistency metrics produce inconsistent scores on semantically equivalent long-document summaries and lose reliability on information-dense claims.
Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers cs.HC · 2025-10-01 · unverdicted · none · ref 2
Attribution gradients consolidate citation evidence and enable incremental unfolding of secondary sources, leading to deeper engagement in a lab study of critical reading tasks for AI answers.
XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration cs.CL · 2025-05-16 · conditional · none · ref 3
XtraGPT is a suite of 1.5B-14B parameter open-source LLMs fine-tuned on 140,000 revision pairs from 7,000 top-tier papers to support controllable, context-aware academic paper editing.
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application cs.CL · 2026-06-10 · unverdicted · none · ref 81
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
TusoAI: Agentic Optimization for Scientific Methods cs.AI · 2025-09-28 · unverdicted · none · ref 2
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research cs.AI · 2026-05-20 · unverdicted · none · ref 10
SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.
Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution cs.CL · 2026-03-05 · unreviewed · ref 9

Weld and Doug Downey and Wen

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer