arxiv: 2605.07208 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 3 theorem links

· Lean Theorem

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

Jianrong Ding, Jianyuan Zhong, Qiang Xu, Zhengyan Shi

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:43 UTC · model grok-4.3

classification 💻 cs.LG

keywords academic impact forecastingcontinuous-time manifoldtopic trajectory modelingLLM evaluation proxyknowledge flow graphspatiotemporal embeddingprospective prediction

0 comments

The pith

A continuous-time manifold model projects papers into evolving topic spaces to forecast their future academic impact more reliably than large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the difficulty of judging new research ideas by treating the forecasting of already-published papers' impact as a measurable proxy task. It shows that frontier LLMs struggle to separate high-impact work from ordinary publications when relying on static text alone. In response, the authors introduce FAME, which embeds papers in a dynamic latent space shaped by both textual content and a knowledge-flow graph, then learns geometric rules that keep impactful papers aligned with their field's forward direction. Experiments across thousands of arXiv papers in fast-moving areas demonstrate that this trajectory-aware approach substantially beats LLM baselines at predicting multidimensional impact. The work also finds that feeding FAME's geometric signals back into LLMs measurably lifts their own forecasting accuracy.

Core claim

FAME projects papers into a dynamic latent space informed by textual features and a verified knowledge-flow graph, learning geometric constraints that align impactful manuscripts with the forward momentum of their fields. This continuous-time manifold evolution enables prospective forecasting of multidimensional impact that outperforms state-of-the-art LLM evaluators on 3,200 arXiv papers from three rapidly changing subfields, while integration of the same signals improves LLM performance.

What carries the argument

The dynamic latent space with learned geometric constraints that enforce alignment between a paper's trajectory and the field's forward momentum, built from text and a knowledge-flow graph.

If this is right

Manuscript impact forecasting becomes a verifiable benchmark for automated scientific evaluation.
Integrating trajectory signals from the manifold model improves LLM accuracy at the same forecasting task.
Static text-based judging by LLMs is shown to be insufficient for reliable prospective evaluation in evolving fields.
Models that respect continuous-time geometric evolution can distinguish high-impact papers from ordinary ones more consistently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be extended to rank grant proposals or preprints by estimating how well they would align with emerging topic directions.
If the manifold constraints generalize, similar methods might apply to forecasting technological or policy outcomes beyond academia.
Testing the model on papers from slower-moving fields would clarify whether the gains depend on rapid topic evolution.

Load-bearing premise

That the measurable future impact of already-published papers serves as a reliable proxy for how well a model can judge entirely new and unpublished research ideas.

What would settle it

A direct comparison of FAME's predicted impact rankings against the actual long-term citation and recognition outcomes of the held-out papers, measured years after the forecast date.

Figures

Figures reproduced from arXiv: 2605.07208 by Jianrong Ding, Jianyuan Zhong, Qiang Xu, Zhengyan Shi.

**Figure 2.** Figure 2: Overall architecture of FAME. First, we construct a high-quality inspiration graph. Second, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of the manifold training dynamics and the three applied geometric constraints. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The heavy-tailed distributions of the four raw log-normalized ground-truth metrics used to [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative ablation of the spatiotemporal latent manifold. The panels demonstrate how [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: 2D visualizations comparing the latent spaces of standard text embeddings versus the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Ternary heatmap illustrating the sensitivity of the FAME framework’s predictive perfor [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Additional examples comparing original text embedding distributions against FAME [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Additional examples comparing original text embedding distributions against FAME [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Additional examples comparing original text embedding distributions against FAME [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Correlation of normalized impact score components with human ratings. The composite [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact forecasting of human-authored manuscripts as a verifiable proxy task. In a prospective forecasting study, we find that frontier LLMs fail to reliably distinguish high-impact papers from ordinary publications, suggesting that static text-based judging is insufficient for scientific evaluation. To address this limitation, we propose $\textbf{FAME}$ ($\underline{\text{F}}$orecasting $\underline{\text{A}}$cademic Impact via Continuous-Time $\underline{\text{M}}$anifold $\underline{\text{E}}$volution), a spatiotemporal framework for modeling the dynamic trajectories of scientific topics. FAME projects papers into a dynamic latent space informed by textual features and a verified knowledge-flow graph, learning geometric constraints that align impactful manuscripts with the forward momentum of their fields. Experiments on 3,200 arXiv papers across three fast-evolving subfields show that FAME consistently and substantially outperforms state-of-the-art LLM evaluators in prospective multidimensional impact forecasting. Furthermore, integrating FAME's dynamic geometric signals into LLMs significantly improves their forecasting performance. These results support manuscript impact forecasting as a useful, measurable proxy benchmark and position FAME as a strong, trajectory-aware foundation for automated scientific evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FAME beats LLMs at forecasting impact for already-published papers using a manifold on a knowledge graph, but the proxy for judging brand-new ideas looks limited.

read the letter

FAME sets up forecasting of real published papers' impact as a proxy for testing how well LLMs can judge research ideas, and it reports that a continuous-time manifold model on a knowledge-flow graph does better than current LLM evaluators on 3200 arXiv papers across three subfields. Adding the model's signals back into the LLMs also lifts their performance. The core observation that static text-based LLM judging falls short on prospective impact is a useful data point. The new element is the spatiotemporal setup that projects papers into a dynamic latent space and learns geometric constraints to keep impactful work aligned with field trajectories. That combination of manifold evolution and the verified graph is not just a rehash of prior citation models. The scale of the prospective study gives it some weight as a benchmark task where outcomes can be checked later. The approach is honest about the time-lag problem in evaluating ideas and tries to create a measurable stand-in. The soft spot is the proxy itself. Published papers already sit inside peer-reviewed trajectories and carry acceptance signals that new, pre-publication ideas lack, so the manifold may mainly be picking up on existing momentum rather than generalizable signals of originality or potential. If that transfer fails, the reported gains do not strongly support claims about automated evaluation of unvetted research. The abstract gives no equations, no details on how the latent space or constraints are implemented, and no error bars or ablation numbers, which makes it hard to judge whether the outperformance is robust or sensitive to choices in data splits and baselines. This work is aimed at researchers in AI-assisted scientometrics and LLM evaluation for science. Readers who want concrete benchmarks for idea assessment will find the proxy task and the reported improvements worth looking at, even if they end up questioning how far the results generalize. It deserves a serious referee because the problem is practical and the method offers a fresh angle on trajectory modeling, though any review will need to press hard on the proxy validity and demand full experimental details and code.

Referee Report

3 major / 1 minor

Summary. The paper proposes FAME, a continuous-time manifold evolution model that embeds papers in a dynamic latent space derived from textual features and a verified knowledge-flow graph, learning geometric constraints to align high-impact papers with field trajectories. It reports that frontier LLMs fail to distinguish high- from low-impact papers in a prospective forecasting task and that FAME substantially outperforms LLM baselines on 3,200 arXiv papers across three subfields; integrating FAME signals also improves LLM performance. The work frames published-paper impact forecasting as a verifiable proxy benchmark for automated scientific evaluation of research ideas.

Significance. If the empirical results are robust, the paper supplies a falsifiable, trajectory-based benchmark that moves beyond static text evaluation and could guide development of AI systems for research assessment. The explicit use of future citation trajectories as ground truth and the demonstration that dynamic geometric signals help LLMs are concrete strengths that would be valuable to the community.

major comments (3)

[Abstract and Experiments] Abstract/Experiments: The central claim of consistent outperformance on 3,200 papers is presented without any description of data splits, baseline implementations, exact multidimensional impact metrics, statistical significance tests, or error bars. This absence prevents verification that reported gains are not attributable to post-hoc choices or data leakage, directly affecting the soundness of the empirical contribution.
[Introduction/Discussion] Introduction/Discussion: The paper relies on the assumption that forecasting impact trajectories of already-published, peer-reviewed manuscripts is a reliable proxy for judging entirely new, pre-publication research ideas. Published papers already embed field momentum and peer-review signals absent from novel ideas; without a concrete transfer experiment or discussion of how the learned manifold would generalize outside existing trajectories, the broader claim that FAME advances automated evaluation of research ideas rests on an untested extrapolation.
[Method] Method: The description of 'learning geometric constraints that align impactful manuscripts with the forward momentum of their fields' leaves open whether the knowledge-flow graph and latent-space construction are fully independent of the target impact labels. If any supervision from impact metrics enters the geometric alignment step, the reported superiority over LLM baselines could be circular; explicit equations or pseudocode clarifying independence are required.

minor comments (1)

[Abstract] The abstract states the number of papers and subfields but does not name the three subfields or the precise time horizon used for prospective forecasting; adding these details would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which helps improve the clarity and rigor of our work. Below we respond to each major comment, indicating the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract and Experiments] The central claim of consistent outperformance on 3,200 papers is presented without any description of data splits, baseline implementations, exact multidimensional impact metrics, statistical significance tests, or error bars. This absence prevents verification that reported gains are not attributable to post-hoc choices or data leakage, directly affecting the soundness of the empirical contribution.

Authors: We agree with this assessment. The current presentation in the abstract and experiments lacks sufficient methodological details for independent verification. In the revised manuscript, we will provide a comprehensive description of the experimental setup, including: the temporal data splits used to ensure prospective forecasting without leakage (training on earlier papers, testing on later ones); full implementation details and hyperparameters for the LLM baselines; exact definitions and computation of the multidimensional impact metrics; results from statistical significance tests; and error bars or standard deviations for all performance figures. These additions will strengthen the empirical contribution and allow readers to assess the robustness of the reported gains. revision: yes
Referee: [Introduction/Discussion] The paper relies on the assumption that forecasting impact trajectories of already-published, peer-reviewed manuscripts is a reliable proxy for judging entirely new, pre-publication research ideas. Published papers already embed field momentum and peer-review signals absent from novel ideas; without a concrete transfer experiment or discussion of how the learned manifold would generalize outside existing trajectories, the broader claim that FAME advances automated evaluation of research ideas rests on an untested extrapolation.

Authors: This is a fair point regarding the scope of our claims. We will revise the Introduction and add a new subsection in the Discussion to explicitly address the proxy nature of the task. We will discuss the differences between published papers (which include peer-review signals) and novel ideas, and explain how the dynamic manifold learned by FAME can be used to evaluate new research by projecting their embeddings and assessing trajectory alignment. While we cannot perform a direct transfer experiment here due to the absence of immediate ground-truth impact for unpublished ideas, we will outline this as an important direction for future work and moderate our claims about automated evaluation of research ideas accordingly. revision: partial
Referee: [Method] The description of 'learning geometric constraints that align impactful manuscripts with the forward momentum of their fields' leaves open whether the knowledge-flow graph and latent-space construction are fully independent of the target impact labels. If any supervision from impact metrics enters the geometric alignment step, the reported superiority over LLM baselines could be circular; explicit equations or pseudocode clarifying independence are required.

Authors: We thank the referee for highlighting this potential ambiguity. The knowledge-flow graph is constructed from citation data and external verification sources without reference to impact metrics. The latent space is initialized from textual embeddings and graph structure in an unsupervised manner. The geometric alignment uses the observed temporal dynamics of papers in the space but does not incorporate impact labels directly into the manifold evolution; impact prediction is handled by a separate module. To eliminate any doubt, we will include precise mathematical formulations and pseudocode in the revised Method section that demonstrate this separation. This will confirm that the superiority over LLM baselines, which rely solely on static text, is not due to circular supervision. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation remains independent of target labels

full rationale

The abstract presents FAME as a spatiotemporal model that projects papers into a latent space using textual features plus an external verified knowledge-flow graph, then learns geometric constraints to align with field momentum. No equations appear, no parameters are described as fitted to the impact labels and then renamed as predictions, and no self-citations or uniqueness theorems are invoked to justify the core construction. The reported outperformance on 3,200 papers is framed as an empirical result rather than a definitional identity. Absent any quoted reduction of the forecasting output to the input labels by construction, the derivation chain does not exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents exhaustive enumeration; the framework relies on an unstated assumption that a dynamic latent space can be learned to capture forward momentum and on the existence of a verified knowledge-flow graph whose construction details are not provided.

axioms (1)

domain assumption The true impact of a new idea may take years to emerge, making direct evaluation of LLM judgments difficult.
Explicitly stated as motivation for using manuscript impact forecasting as proxy task.

invented entities (1)

dynamic latent space informed by textual features and knowledge-flow graph no independent evidence
purpose: To project papers and learn geometric constraints aligning impactful manuscripts with field momentum.
Core modeling construct introduced in the FAME description; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5545 in / 1498 out tokens · 46889 ms · 2026-05-11T02:43:59.665343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost uniqueness, Aczél classification) washburn_uniqueness_aczel unclear
FAME projects papers into a dynamic latent space ... learning geometric constraints that align impactful manuscripts with the forward momentum of their fields ... S(p_new) = cos(z_new - μ_cnew(t_new), μ'_cnew(t_new))
IndisputableMonolith/Foundation/ArrowOfTime.lean (Berry-phase monotonicity and Z-complexity) forward_accumulates echoes
continuous-time spatiotemporal manifold ... topic spine model ... vanguard loss ... align with the topic’s forward momentum
IndisputableMonolith/Foundation/AlexanderDuality.lean (D=3 from circle linking) alexander_duality_circle_linking unclear
18-month sliding-window evaluation on 3,200 arXiv papers

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 10 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

On the surprising behavior of distance metrics in high dimensional space

Charu C Aggarwal, Alexander Hinneburg, and Daniel A Keim. On the surprising behavior of distance metrics in high dimensional space. InInternational conference on database theory, pages 420–434. Springer, 2001

work page 2001
[3]

The skewness of science in 219 sub-fields and a number of aggregates.Scientometrics, 88(2):385–397, 2011

Pedro Albarrán, Juan A Crespo, Ignacio Ortuño, and Javier Ruiz-Castillo. The skewness of science in 219 sub-fields and a number of aggregates.Scientometrics, 88(2):385–397, 2011

work page 2011
[4]

Researchagent: Iterative research idea generation over scientific literature with large language models

Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. Researchagent: Iterative research idea generation over scientific literature with large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pa...

work page 2025
[5]

Autonomous chemical research with large language models.Nature, 624(7992):570–578, 2023

Daniil A Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, 2023

work page 2023
[6]

arXiv preprint arXiv:2304.05376 , year=

Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023

work page arXiv 2023
[7]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments with gpt-4.arXiv preprint arXiv:2303.12712, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Beltrami flow and neural diffusion on graphs.Advances in Neural Information Processing Systems, 34:1594–1609, 2021

Benjamin Chamberlain, James Rowbottom, Davide Eynard, Francesco Di Giovanni, Xiaowen Dong, and Michael Bronstein. Beltrami flow and neural diffusion on graphs.Advances in Neural Information Processing Systems, 34:1594–1609, 2021

work page 2021
[9]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

work page 2016
[10]

Structural scaffolds for citation intent classification in scientific publications

Arman Cohan, Waleed Ammar, Madeleine Van Zuylen, and Field Cady. Structural scaffolds for citation intent classification in scientific publications. InProceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long and short papers), pages 3586–3596, 2019

work page 2019
[11]

Support-vector networks.Machine learning, 20(3):273– 297, 1995

Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine learning, 20(3):273– 297, 1995

work page 1995
[12]

Towards an AI co-scientist

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist. arXiv preprint arXiv:2502.18864, 2025

work page internal anchor Pith review arXiv 2025
[13]

Temporal knowledge graph forecasting with neural ode.arXiv preprint arXiv:2101.05151, 2021

Zhen Han, Zifeng Ding, Yunpu Ma, Yujia Gu, and V olker Tresp. Temporal knowledge graph forecasting with neural ode.arXiv preprint arXiv:2101.05151, 2021

work page arXiv 2021
[14]

Rodriques, and Andrew D

Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, and Andrew D White. Paperqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559, 2023

work page arXiv 2023
[15]

Xiaona Li, Zhu Wang, Xindong Chen, Bin Guo, and Zhiwen Yu. A hybrid continuous- time dynamic graph representation learning model by exploring both temporal and repetitive information.ACM Transactions on Knowledge Discovery from Data, 17(9):1–22, 2023

work page 2023
[16]

Citation importance- aware document representation learning for large-scale science mapping.Information Process- ing & Management, 63(3):104557, 2026

Zhentao Liang, Nees Jan van Eck, Xuehua Wu, Jin Mao, and Gang Li. Citation importance- aware document representation learning for large-scale science mapping.Information Process- ing & Management, 63(3):104557, 2026. 10

work page 2026
[17]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. Deepseek-v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics, 12:157–173, 2024

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics, 12:157–173, 2024

work page 2024
[19]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scien- tist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024

work page internal anchor Pith review arXiv 2024
[20]

AInstein: Can LLMs Solve Research Problems From Parametric Memory Alone?

Shambhavi Mishra, Gaurav Sahu, Marco Pedersoli, Laurent Charlin, Jose Dolz, and Christopher Pal. Ainstein: Assessing the feasibility of ai-generated approaches to research problems.arXiv preprint arXiv:2510.05432, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Continuous-time link prediction via temporal dependent graph neural network

Liang Qu, Huaisheng Zhu, Qiqi Duan, and Yuhui Shi. Continuous-time link prediction via temporal dependent graph neural network. InProceedings of the web conference 2020, pages 3026–3032, 2020

work page 2020
[22]

Universality of citation distributions: Toward an objective measure of scientific impact.Proceedings of the National Academy of Sciences, 105(45):17268–17272, 2008

Filippo Radicchi, Santo Fortunato, and Claudio Castellano. Universality of citation distributions: Toward an objective measure of scientific impact.Proceedings of the National Academy of Sciences, 105(45):17268–17272, 2008

work page 2008
[23]

Temporal graph networks for deep learning on dynamic graphs,

Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs.arXiv preprint arXiv:2006.10637, 2020

work page arXiv 2006
[24]

Kernel principal component analysis

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Kernel principal component analysis. InInternational conference on artificial neural networks, pages 583–588. Springer, 1997

work page 1997
[25]

Si, C., Hashimoto, T., and Yang, D

Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers.arXiv preprint arXiv:2409.04109, 2024

work page arXiv 2024
[26]

OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

A tutorial on support vector regression.Statistics and computing, 14(3):199–222, 2004

Alex J Smola and Bernhard Schölkopf. A tutorial on support vector regression.Statistics and computing, 14(3):199–222, 2004

work page 2004
[28]

2025 , doi =

Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. Ai-researcher: Autonomous scientific innovation.arXiv preprint arXiv:2505.18705, 2025

work page arXiv 2025
[29]

Galactica: A Large Language Model for Science

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. Galactica: A large language model for science.arXiv preprint arXiv:2211.09085, 2022

work page internal anchor Pith review arXiv 2022
[30]

Ai can learn scientific taste.arXiv preprint arXiv:2603.14473, 2026

Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, et al. Ai can learn scientific taste.arXiv preprint arXiv:2603.14473, 2026

work page arXiv 2026
[31]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[32]

Quantifying long-term scientific impact.Science, 342(6154):127–132, 2013

Dashun Wang, Chaoming Song, and Albert-László Barabási. Quantifying long-term scientific impact.Science, 342(6154):127–132, 2013

work page 2013
[33]

Large language models are not fair evaluators

Peiyi Wang, Lei Li, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Lingpeng Kong, Qi Liu, Tianyu Liu, et al. Large language models are not fair evaluators. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9440–9450, 2024. 11

work page 2024
[34]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

work page 2024
[35]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Large language models as optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[38]

Towards better dynamic graph learning: New architecture and unified library.Advances in Neural Information Processing Systems, 36:67686–67700, 2023

Le Yu, Leilei Sun, Bowen Du, and Weifeng Lv. Towards better dynamic graph learning: New architecture and unified library.Advances in Neural Information Processing Systems, 36:67686–67700, 2023

work page 2023
[39]

T-gcn: A temporal graph convolutional network for traffic prediction.IEEE transactions on intelligent transportation systems, 21(9):3848–3858, 2019

Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. T-gcn: A temporal graph convolutional network for traffic prediction.IEEE transactions on intelligent transportation systems, 21(9):3848–3858, 2019

work page 2019
[40]

is_inspired

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023. 12 A Model Details A.1 Hyperparameters and Computation Resources The hyperparameters for con...

work page 2023