arxiv: 2604.12651 · v1 · submitted 2026-04-14 · 💻 cs.CL · cs.AI

Recognition: unknown

Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs

Alkid Baci , Luke Friedrichs , Caglar Demir , N'Dah Jean Kouagou , Axel-Cyrille Ngonga Ngomo

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:30 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords knowledge graph embeddingprompt learningchain of thoughtlink predictionOWL reasoningBayesian optimizationlarge language modelsfew-shot learning

0 comments

The pith

Chain-of-thought prompts optimized on under 30 examples let language models score and complete knowledge graph triples including literals and complex OWL expressions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reformulates link prediction as an optimization task over prompt strings rather than vector embeddings. RALP searches for effective chain-of-thought prompts that act as scoring functions, using Bayesian optimization on very small training sets. This matters because standard embedding models cannot handle new entities, relations or numerical values that appear after training, while prompts allow the language model to reason step by step over the triple. The resulting prompts improve mean reciprocal rank on several benchmarks and reach high agreement with ground-truth answers on OWL instance retrieval tasks that involve quantified class expressions.

Core claim

RALP learns string-based chain-of-thought prompts as scoring functions for triples by applying Bayesian Optimization through the MIPRO algorithm to fewer than 30 training examples without gradient access. At inference time the same prompts predict missing entities, relations or literals, return whole triples, and produce confidence scores derived from the language model's output.

What carries the argument

RALP, which treats learned chain-of-thought prompt strings as scoring functions for knowledge-graph triples and optimizes them via MIPRO Bayesian search on small example sets.

If this is right

RALP raises MRR of state-of-the-art KGE models by more than 5 percent across the evaluated datasets.
On OWL reasoning benchmarks with expressions such as existential and cardinality restrictions, RALP reaches over 88 percent Jaccard similarity to the expected answers.
The method produces high-quality inferred triples that improve generalization on transductive, numerical, and instance-retrieval tasks.
Prompts can be used at inference to predict any part of a triple or an entire triple while returning a confidence value.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dynamic knowledge graphs could add new literals without retraining embeddings by switching to prompt-based scoring.
The same optimization loop might be tested on other structured prediction settings where labeled examples are scarce.
Hybrid systems that combine the learned prompts with existing embedding models could cover both seen and unseen elements more robustly.

Load-bearing premise

Prompts found by optimizing on fewer than 30 examples will reliably generalize to unseen entities, relations, and literals without large hallucination or heavy dependence on the base language model's pretraining data.

What would settle it

Apply the learned prompts to a test collection containing only triples with novel literals or complex OWL expressions absent from the 30-example optimization set and verify whether the MRR or Jaccard scores remain above those of the strongest embedding baselines.

Figures

Figures reproduced from arXiv: 2604.12651 by Alkid Baci, Axel-Cyrille Ngonga Ngomo, Caglar Demir, Luke Friedrichs, N'Dah Jean Kouagou.

**Figure 1.** Figure 1: Overview of the RALP framework in three different tasks within the link prediction problem. Notice that [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: RALP predictions for the human development index property (P1081) in LitWD1K dataset. Ground truth depicted in black dots, regression prediction in blue cross and interval range in teal lines. For this property we report a mean squared error of 0.006 and a mean absolute error of 0.059. Some countries are omitted from the plot because of the limited space. The CoT module performs step-by-step reasoning as f… view at source ↗

read the original abstract

Knowledge graph embedding (KGE) models perform well on link prediction but struggle with unseen entities, relations, and especially literals, limiting their use in dynamic, heterogeneous graphs. In contrast, pretrained large language models (LLMs) generalize effectively through prompting. We reformulate link prediction as a prompt learning problem and introduce RALP, which learns string-based chain-of-thought (CoT) prompts as scoring functions for triples. Using Bayesian Optimization through MIPRO algorithm, RALP identifies effective prompts from fewer than 30 training examples without gradient access. At inference, RALP predicts missing entities, relations or whole triples and assigns confidence scores based on the learned prompt. We evaluate on transductive, numerical, and OWL instance retrieval benchmarks. RALP improves state-of-the-art KGE models by over 5% MRR across datasets and enhances generalization via high-quality inferred triples. On OWL reasoning tasks with complex class expressions (e.g., $\exists hasChild.Female$, $\geq 5 \; hasChild.Female$), it achieves over 88% Jaccard similarity. These results highlight prompt-based LLM reasoning as a flexible alternative to embedding-based methods. We release our implementation, training, and evaluation pipeline as open source: https://github.com/dice-group/RALP .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RALP shows prompt optimization via MIPRO can beat KGE baselines on literals and OWL tasks from under 30 examples, but the gains rest on untested assumptions about prompt robustness.

read the letter

The paper's core move is to treat link prediction as learning short chain-of-thought prompt strings that act as scoring functions, optimized with MIPRO on fewer than 30 examples. This lets them handle unseen entities, relations, and literals without retraining embeddings, and they report over 5% MRR gains plus strong Jaccard scores on complex OWL class expressions like existential and cardinality restrictions. Releasing the full pipeline is a clear plus for anyone who wants to inspect or extend the work.

Referee Report

2 major / 2 minor

Summary. The paper proposes RALP, a method that reformulates link prediction on knowledge graphs as a prompt-learning task. It uses the MIPRO Bayesian optimization algorithm to discover effective chain-of-thought prompt strings from fewer than 30 training examples; these prompts then act as scoring functions to predict missing entities, relations, or literals and to assign confidence scores. The authors report that RALP improves state-of-the-art KGE models by more than 5% MRR on transductive, numerical, and OWL instance-retrieval benchmarks and reaches over 88% Jaccard similarity on OWL reasoning tasks involving complex class expressions.

Significance. If the performance claims are shown to be robust, the work offers a lightweight, gradient-free alternative to embedding-based link prediction that can handle unseen entities, relations, and literals. The open-source release of the full implementation, training, and evaluation pipeline is a concrete contribution that supports reproducibility.

major comments (2)

[Evaluation and results] The central empirical claims (≥5% MRR improvement and ≥88% Jaccard on complex OWL expressions) rest on prompt optimization performed on <30 examples; the manuscript provides no controls for prompt sensitivity, variance across random seeds or base LLMs, or explicit tests that the learned prompts generalize to entities/relations whose surface forms are absent from the LLM's pre-training corpus.
[Experimental setup] No section reports safeguards against data leakage between the MIPRO optimization set and the test triples, nor does it compare against strong prompt-based or retrieval-augmented baselines that would isolate whether gains arise from genuine reasoning or from surface-pattern retrieval.

minor comments (2)

[Abstract] The abstract states performance gains without naming the exact datasets, the precise KGE baselines, or the number of runs; these details should be supplied for immediate clarity.
[Method] Notation for the learned CoT prompts and the scoring function derived from them should be introduced formally (e.g., as an equation) rather than only described in prose.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment below with clarifications from the manuscript and indicate the revisions we will make to strengthen the presentation and empirical support.

read point-by-point responses

Referee: [Evaluation and results] The central empirical claims (≥5% MRR improvement and ≥88% Jaccard on complex OWL expressions) rest on prompt optimization performed on <30 examples; the manuscript provides no controls for prompt sensitivity, variance across random seeds or base LLMs, or explicit tests that the learned prompts generalize to entities/relations whose surface forms are absent from the LLM's pre-training corpus.

Authors: The use of fewer than 30 examples is a deliberate design choice to demonstrate the data efficiency of MIPRO-based prompt optimization for link prediction. We agree that additional controls would increase confidence in the results. In the revised manuscript we will report performance variance across multiple random seeds for the MIPRO runs on the main benchmarks, which will quantify prompt stability. While a full sweep across base LLMs is beyond the current scope, we will add a brief discussion of model sensitivity based on preliminary checks with an alternative LLM. For generalization to surface forms absent from pre-training, the OWL instance-retrieval tasks rely on compositional class expressions (e.g., existential and cardinality restrictions) that are not directly present in training corpora, and the numerical literal benchmarks explicitly test unseen numeric values; we will expand the discussion section to highlight these aspects and note the limitation of not having performed an explicit out-of-corpus surface-form ablation. revision: partial
Referee: [Experimental setup] No section reports safeguards against data leakage between the MIPRO optimization set and the test triples, nor does it compare against strong prompt-based or retrieval-augmented baselines that would isolate whether gains arise from genuine reasoning or from surface-pattern retrieval.

Authors: The optimization set for MIPRO is sampled exclusively from the training split of each benchmark, following the standard transductive and numerical data partitions described in the experimental setup; no test triples are used during prompt discovery. We acknowledge that an explicit statement of this partitioning and leakage safeguards is currently absent and will add a dedicated paragraph in the revised experimental setup section detailing the splits and confirming zero overlap. Regarding baselines, the manuscript already compares against state-of-the-art KGE models to establish improvement over embedding methods. To better isolate the contribution of the learned CoT prompts versus surface retrieval, we will include additional experiments against zero-shot and few-shot prompting baselines as well as a simple retrieval-augmented prompt baseline in the updated results tables. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical prompt optimization evaluated on held-out benchmarks

full rationale

The paper presents RALP as an empirical method that optimizes CoT prompt strings via the MIPRO Bayesian optimizer on fewer than 30 examples and measures link-prediction MRR and OWL Jaccard scores on standard transductive/numerical/OWL benchmarks. No equations appear that define a target quantity in terms of itself, no fitted parameter is relabeled as a prediction, and the central performance claims rest on train/test splits rather than any self-referential derivation or load-bearing self-citation chain. The reported gains are therefore independent experimental outcomes, not tautological restatements of the method's inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that large language models can perform structured reasoning when guided by learned chain-of-thought prompts; no numeric free parameters or new invented entities are specified in the abstract.

axioms (1)

domain assumption Large language models can perform reliable step-by-step reasoning for knowledge-graph completion tasks when given appropriately optimized chain-of-thought prompts.
This assumption underpins the reformulation of link prediction as a prompt-learning problem.

pith-pipeline@v0.9.0 · 5554 in / 1316 out tokens · 49608 ms · 2026-05-10T15:30:42.356755+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 8 canonical work pages · 3 internal anchors

[1]

Knowledge graph construction and applications for web search and beyond.Data Intelligence, 1(4):333–349, 2019

Peilu Wang, Hao Jiang, Jingfang Xu, and Qi Zhang. Knowledge graph construction and applications for web search and beyond.Data Intelligence, 1(4):333–349, 2019

2019
[2]

arXiv preprint arXiv:2007.00808 , year=

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text retrieval.arXiv preprint arXiv:2007.00808, 2020

work page arXiv 2007
[3]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review arXiv 2024
[4]

Knowledge-augmented language model prompting for zero- shot knowledge graph question answering

Jinheon Baek, Alham Fikri Aji, and Amir Saffari. Knowledge-augmented language model prompting for zero- shot knowledge graph question answering. In Bhavana Dalvi Mishra, Greg Durrett, Peter Jansen, Danilo Neves Ribeiro, and Jason Wei, editors,Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE), pages 78–106, T...

2023
[5]

G-retriever: Retrieval-augmented generation for textual graph understanding and question answering

Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. Advances in Neural Information Processing Systems, 37:132876–132907, 2024

2024
[6]

A survey on knowl- edge graph-based recommender systems.IEEE Transactions on Knowledge and Data Engineering, 34(8):3549– 3568, 2020

Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. A survey on knowl- edge graph-based recommender systems.IEEE Transactions on Knowledge and Data Engineering, 34(8):3549– 3568, 2020

2020
[7]

Knowledge graph convolutional networks for recommender systems.CoRR, abs/1904.12575, 2019

Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. Knowledge graph convolutional networks for recommender systems.CoRR, abs/1904.12575, 2019

work page arXiv 1904
[8]

Convolutional 2d knowledge graph embeddings

Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2d knowledge graph embeddings. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

2018
[9]

You CAN teach an old dog new tricks! on training knowledge graph embeddings

Daniel Ruffinelli, Samuel Broscheit, and Rainer Gemulla. You CAN teach an old dog new tricks! on training knowledge graph embeddings. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020

2020
[10]

A survey on knowledge graph embedding: Approaches, applications and benchmarks.Electronics, 9(5):750, 2020

Yuanfei Dai, Shiping Wang, Neal N Xiong, and Wenzhong Guo. A survey on knowledge graph embedding: Approaches, applications and benchmarks.Electronics, 9(5):750, 2020

2020
[11]

Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213, 2022

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213, 2022

2022
[12]

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM computing surveys, 55(9):1–35, 2023

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM computing surveys, 55(9):1–35, 2023

2023
[13]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt ¨aschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

2020
[14]

In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics, 11:1316–1331, 2023

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics, 11:1316–1331, 2023

2023
[15]

KILT: a benchmark for knowledge intensive language tasks

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rockt¨aschel, and Sebastian Riedel. KILT: a benchmark for knowledge intensive language tasks. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Bel...

2021
[16]

Large language models are human-level prompt engineers

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers. InThe Eleventh International Conference on Learning Representations, 2022

2022
[17]

Large language models as optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2024

2024
[18]

Connecting large language models with evolutionary algorithms yields powerful prompt optimizers

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. In The Twelfth International Conference on Learning Representations, 2024

2024
[19]

Optimizing instructions and demonstrations for multi-stage language model programs

Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. Optimizing instructions and demonstrations for multi-stage language model programs. In Yaser Al- Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9...

2024
[20]

Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christo- pher Potts

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan A, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christo- pher Potts. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. InThe Twelfth International Conference on Learning Represe...

2024
[21]

Knowledge graphs.ACM Computing Surveys (CSUR), 54(4):1–37, 2021

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sab- rina Kirrane, Jos ´e Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. Knowledge graphs.ACM Computing Surveys (CSUR), 54(4):1–37, 2021. 12

2021
[22]

Clifford embeddings–a generalized approach for embedding in normed algebras

Caglar Demir and Axel-Cyrille Ngonga Ngomo. Clifford embeddings–a generalized approach for embedding in normed algebras. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 567–582. Springer, 2023

2023
[23]

Kglm: Integrating knowledge graph structure in language models for link prediction.arXiv preprint arXiv:2211.02744, 2022

Jason Youn and Ilias Tagkopoulos. Kglm: Integrating knowledge graph structure in language models for link prediction.arXiv preprint arXiv:2211.02744, 2022

work page arXiv 2022
[24]

Knowledge graph large language model (kg-llm) for link prediction.arXiv preprint arXiv:2403.07311,

Dong Shu, Tianle Chen, Mingyu Jin, Chong Zhang, Mengnan Du, and Yongfeng Zhang. Knowledge graph large language model (kg-llm) for link prediction.arXiv preprint arXiv:2403.07311, 2024

work page arXiv 2024
[25]

Linkgpt: Teaching large language models to predict missing links.arXiv preprint arXiv:2406.04640, 2024

Zhongmou He, Jing Zhu, Shengyi Qian, Joyce Chai, and Danai Koutra. Linkgpt: Teaching large language models to predict missing links.arXiv preprint arXiv:2406.04640, 2024

work page arXiv 2024
[26]

Prompting disentangled embeddings for knowledge graph completion with pre-trained language model.Expert Systems with Applications, 268:126175, 2025

Yuxia Geng, Jiaoyan Chen, Yuhang Zeng, Zhuo Chen, Wen Zhang, Jeff Z Pan, Yuxiang Wang, and Xiaoliang Xu. Prompting disentangled embeddings for knowledge graph completion with pre-trained language model.Expert Systems with Applications, 268:126175, 2025

2025
[27]

Automatic prompt optimization with ”gradient descent” and beam search, 2023

Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, and Michael Zeng. Automatic prompt optimization with ”gradient descent” and beam search, 2023

2023
[28]

Evoprompt: Connecting llms with evolutionary algorithms yields powerful prompt optimizers, 2025

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. Evoprompt: Connecting llms with evolutionary algorithms yields powerful prompt optimizers, 2025

2025
[29]

Large language models can self-improve, 2022

Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei Han. Large language models can self-improve, 2022

2022
[30]

Enabling intelligent interactions between an agent and an llm: A reinforcement learning approach, 2024

Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, and Bin Liu. Enabling intelligent interactions between an agent and an llm: A reinforcement learning approach, 2024

2024
[31]

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010

2010
[32]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Hardware-agnostic computation for large-scale knowledge graph embeddings.Software Impacts, 13:100377, 2022

Caglar Demir and Axel-Cyrille Ngonga Ngomo. Hardware-agnostic computation for large-scale knowledge graph embeddings.Software Impacts, 13:100377, 2022

2022
[34]

Literallywikidata - a benchmark for knowledge graph completion using literals, April 2021

Genet Asefa Gesese, Mehwish Alam, and Harald Sack. Literallywikidata - a benchmark for knowledge graph completion using literals, April 2021

2021
[35]

Ontolearn—a framework for large-scale owl class expression learning in python.Journal of Machine Learning Research, 26(63):1–6, 2025

Caglar Demir, Alkid Baci, N’Dah Jean Kouagou, Leonie Nora Sieger, Stefan Heindorf, Simon Bin, Lukas Bl¨ubaum, Alexander Bigerl, and Axel-Cyrille Ngonga Ngomo. Ontolearn—a framework for large-scale owl class expression learning in python.Journal of Machine Learning Research, 26(63):1–6, 2025

2025
[36]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

2023
[37]

Qwen2.5: A party of foundation models, September 2024

Qwen Team. Qwen2.5: A party of foundation models, September 2024

2024
[38]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Day- iheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfen...

work page internal anchor Pith review Pith/arXiv arXiv 2024