Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.
hub
org/abs/2004.07180
11 Pith papers cite this work, alongside 274 external citations. Polarity classification is still indexing.
hub tools
years
2026 11representative citing papers
Phantom collaborators—topically similar authors distant in the coauthor graph—become actual coauthors 16-33 times more often than baselines, with a 68-fold similarity gradient.
MasterSet is a new large-scale benchmark for must-cite citation recommendation in AI/ML, using LLM-annotated tiers on 150k papers and Recall@K evaluation.
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.
The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.
RouteHead trains a lightweight router to dynamically select optimal LLM attention heads per query for improved attention-based document re-ranking.
A semantic geometry based on the R-P-C framework classifies publications and links semantic similarity between knowledge base and diffusion to disruption, novelty, team size, and citation trajectories.
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
PeeriScope is an open modular framework that integrates structured features, LLM rubric assessments, and supervised prediction to evaluate peer review quality for self-assessment, editorial triage, and large-scale auditing.
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
citing papers explorer
-
Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics
Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.
-
Beyond coauthorship: semantic structure and phantom collaborators in transportation research, 1967--2025
Phantom collaborators—topically similar authors distant in the coauthor graph—become actual coauthors 16-33 times more often than baselines, with a 68-fold similarity gradient.
-
MasterSet: A Large-Scale Benchmark for Must-Cite Citation Recommendation in the AI/ML Literature
MasterSet is a new large-scale benchmark for must-cite citation recommendation in AI/ML, using LLM-annotated tiers on 150k papers and Recall@K evaluation.
-
Unlocking LLM Creativity in Science through Analogical Reasoning
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
-
CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation
CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.
-
Aspect-Aware Content-Based Recommendations for Mathematical Research Papers
The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.
-
Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models
RouteHead trains a lightweight router to dynamically select optimal LLM attention heads per query for improved attention-based document re-ranking.
-
A Semantic Geometry for Uncovering Paradigm Dynamics via Scientific Publications
A semantic geometry based on the R-P-C framework classifies publications and links semantic similarity between knowledge base and diffusion to disruption, novelty, team size, and citation trajectories.
-
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
-
PeeriScope: A Multi-Faceted Framework for Evaluating Peer Review Quality
PeeriScope is an open modular framework that integrates structured features, LLM rubric assessments, and supervised prediction to evaluate peer review quality for self-assessment, editorial triage, and large-scale auditing.
-
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.