Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
Pith reviewed 2026-06-28 10:51 UTC · model grok-4.3
The pith
Skill-RM reformulates reward modeling as execution of a reusable Reward-Evaluation Skill to unify heterogeneous criteria.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Skill-RM supplies a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. Treating reward computation as a structured agentic task gives a consistent interface for orchestrating heterogeneous resources and dynamically selecting and aggregating evidence tailored to each input, which yields consistency and transparency across diverse tasks.
What carries the argument
The Reward-Evaluation Skill, a reusable agentic module that dynamically selects and aggregates evidence from rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics.
If this is right
- Skill-RM delivers higher scores than traditional judge baselines on standard reward benchmarks.
- The same model improves best-of-N selection quality when used as the ranking signal.
- Reinforcement learning pipelines obtain stronger training signals from the dynamically orchestrated evidence.
- Evaluation becomes consistent and transparent across tasks that previously required separate verifiers.
Where Pith is reading between the lines
- The agentic interface could be applied to other LLM evaluation settings that mix rules and rubrics, such as safety classifiers.
- A single skill might replace collections of task-specific verifiers in large-scale alignment pipelines.
- Dynamic evidence selection raises the possibility of measuring which evidence types contribute most to final scores on different domains.
Load-bearing premise
Reformulating reward computation as execution of a reusable Reward-Evaluation Skill will integrate heterogeneous evidence types while preserving or improving evaluation quality without introducing new inconsistencies or selection biases.
What would settle it
A controlled test set of new heterogeneous criteria where Skill-RM produces lower agreement with human labels or lower downstream task performance than the strongest single-criterion baseline.
read the original abstract
Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Skill-RM, a framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. This agentic approach provides a consistent interface for dynamically selecting and aggregating heterogeneous evidence (rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics) to produce reward signals for LLM post-training. The paper claims that this yields superior performance over traditional judge baselines on reward benchmarks and in downstream applications including best-of-N selection and reinforcement learning, with the code released at a public repository.
Significance. If the reported gains are reproducible and not artifacts of post-hoc choices, the work could offer a practical unification of disparate reward evaluation methods, reducing the need for task-specific verifiers in RFT and RL pipelines. The agentic formulation is a conceptual contribution that may generalize beyond the evaluated settings.
major comments (1)
- [Abstract] Abstract: the central claim of consistent outperformance on reward benchmarks and downstream tasks is asserted without any reported metrics, baseline names, dataset sizes, ablation results, or statistical significance tests. This absence prevents verification that the gains derive from the proposed orchestration mechanism rather than implementation details or evaluation choices.
Simulated Author's Rebuttal
We thank the referee for the review and the opportunity to clarify the presentation of our results. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of consistent outperformance on reward benchmarks and downstream tasks is asserted without any reported metrics, baseline names, dataset sizes, ablation results, or statistical significance tests. This absence prevents verification that the gains derive from the proposed orchestration mechanism rather than implementation details or evaluation choices.
Authors: The abstract is written as a high-level summary, consistent with standard practice for concise overviews. The full manuscript (Sections 4 and 5) reports the requested details: concrete metrics on multiple reward benchmarks, comparisons against named traditional judge baselines, dataset sizes and splits, ablation studies isolating the contribution of dynamic evidence orchestration, and statistical significance testing. These results support that the observed gains stem from the agentic formulation rather than implementation artifacts. We are willing to incorporate one or two key quantitative highlights into the abstract in a revision if the editor prefers a more results-oriented abstract. revision: partial
Circularity Check
No significant circularity
full rationale
The paper proposes Skill-RM as a methodological reformulation of reward modeling into an agentic skill-execution task, with performance claims resting on external benchmark experiments rather than any internal derivation chain. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the abstract or described full text that reduce outputs to inputs by construction. The framework description and experimental results stand as independent content.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Reward-Evaluation Skill
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
On diversified preferences of large language model alignment , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
2024
-
[2]
arXiv preprint arXiv:2204.05862 , year=
Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=
-
[3]
The Fourteenth International Conference on Learning Representations , year =
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision , author =. The Fourteenth International Conference on Learning Representations , year =. 2510.18821 , archivePrefix =
-
[4]
2026 , booktitle =
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance , author=. 2026 , booktitle =
2026
-
[5]
Findings of the Association for Computational Linguistics , year=
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game , author=. Findings of the Association for Computational Linguistics , year=
-
[6]
arXiv preprint arXiv:2309.03126 , year=
Everyone deserves a reward: Learning customized human preferences , author=. arXiv preprint arXiv:2309.03126 , year=
-
[7]
Advances in Neural Information Processing Systems , volume =
Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems , volume =. 2022 , url =
2022
-
[8]
Advances in Neural Information Processing Systems , volume =
Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author =. Advances in Neural Information Processing Systems , volume =. 2023 , url =
2023
-
[9]
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =
A General Theoretical Paradigm to Understand Learning from Human Preferences , author =. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =. 2024 , volume =
2024
-
[10]
Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric and Zhang, Hao and Gonzalez, Joseph and Stoica, Ion , booktitle =. Judging. 2023 , url =
2023
-
[11]
Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh , booktitle =. 2025 , address =. doi:10.18653/v1/2025.findings-naacl.96 , url =
-
[12]
and Hajishirzi, Hannaneh and Lambert, Nathan , booktitle =
Malik, Saumya and Pyatkin, Valentina and Land, Sander and Morrison, Jacob and Smith, Noah A. and Hajishirzi, Hannaneh and Lambert, Nathan , booktitle =. 2026 , url =
2026
-
[13]
Findings of the Association for Computational Linguistics: EMNLP 2024 , month = nov, year =
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts , author =. Findings of the Association for Computational Linguistics: EMNLP 2024 , month = nov, year =. doi:10.18653/v1/2024.findings-emnlp.620 , url =
-
[14]
2026 , url =
Liu, Chris Yuhao and Zeng, Liang and Xiao, Yuzhen and He, Jujie and Liu, Jiacai and Wang, Chaojie and Yan, Rui and Shen, Wei and Zhang, Fuxiang and Xu, Jiacheng and Liu, Yang , booktitle =. 2026 , url =
2026
-
[15]
2024 , url =
Kim, Seungone and Shin, Jamin and Cho, Yejin and Jang, Joel and Longpre, Shayne and Lee, Hwaran and Yun, Sangdoo and Shin, Seongjin and Kim, Sungdong and Thorne, James and Seo, Minjoon , booktitle =. 2024 , url =
2024
-
[16]
Kim, Seungone and Suk, Juyoung and Longpre, Shayne and Lin, Bill Yuchen and Shin, Jamin and Welleck, Sean and Neubig, Graham and Lee, Moontae and Lee, Kyungjae and Seo, Minjoon , booktitle =. 2024 , address =. doi:10.18653/v1/2024.emnlp-main.248 , url =
-
[17]
2024 , eprint =
Generative Reward Models , author =. 2024 , eprint =
2024
-
[18]
Self-Generated Critiques Boost Reward Modeling for Language Models , author =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month = apr, year =. doi:10.18653/v1/2025.naacl-long.573 , url =
-
[19]
2026 , url =
Chen, Xiusi and Li, Gaotang and Wang, Ziqi and Jin, Bowen and Qian, Cheng and Wang, Yu and Wang, Hongru and Zhang, Yu and Zhang, Denghui and Zhang, Tong and Tong, Hanghang and Ji, Heng , booktitle =. 2026 , url =
2026
-
[20]
2025 , url =
Hong, Ilgee and Yu, Changlong and Qiu, Liang and Yan, Weixiang and Xu, Zhenghao and Jiang, Haoming and Zhang, Qingru and Lu, Qin and Liu, Xin and Zhang, Chao and Zhao, Tuo , booktitle =. 2025 , url =
2025
-
[21]
Inference-time scaling for generalist reward modeling,
Inference-Time Scaling for Generalist Reward Modeling , author =. 2025 , eprint =. doi:10.48550/arXiv.2504.02495 , url =
-
[22]
doi:10.48550/arXiv.2506.03637 , url =
Yu, Zhuohao and Zeng, Jiali and Gu, Weizheng and Wang, Yidong and Wang, Jindong and Meng, Fandong and Zhou, Jie and Zhang, Yue and Zhang, Shikun and Ye, Wei , year =. doi:10.48550/arXiv.2506.03637 , url =. 2506.03637 , archivePrefix =
-
[23]
Constitutional AI: Harmlessness from AI Feedback
Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and Tran-Johnson, Eli and Perez, Ethan an...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073
-
[24]
2024 , url =
Ye, Seonghyeon and Kim, Doyoung and Kim, Sungdong and Hwang, Hyeonbin and Kim, Seungone and Jo, Yongrae and Thorne, James and Kim, Juho and Seo, Minjoon , booktitle =. 2024 , url =
2024
-
[25]
Saad-Falcon, Jon and Vivek, Rajan Pathe and Berrios, William and Naik, Nandita Shankar and Franklin, Matija and Vidgen, Bertie and Singh, Amanpreet and Kiela, Douwe and Mehri, Shikib , booktitle =. 2025 , address =. doi:10.18653/v1/2025.findings-emnlp.176 , url =
-
[26]
Findings of the Association for Computational Linguistics: ACL 2024 , month = aug, year =
Benchmarking Cognitive Biases in Large Language Models as Evaluators , author =. Findings of the Association for Computational Linguistics: ACL 2024 , month = aug, year =. doi:10.18653/v1/2024.findings-acl.29 , url =
-
[27]
Advances in Neural Information Processing Systems , volume =
Checklists Are Better Than Reward Models for Aligning Language Models , author =. Advances in Neural Information Processing Systems , volume =. 2025 , url =
2025
-
[28]
2026 , eprint =
Open Rubric System: Scaling Reinforcement Learning with Pairwise Adaptive Rubric , author =. 2026 , eprint =
2026
-
[29]
2023 , eprint =
Instruction-Following Evaluation for Large Language Models , author =. 2023 , eprint =
2023
-
[30]
Advances in Neural Information Processing Systems , volume =
Generalizing Verifiable Instruction Following , author =. Advances in Neural Information Processing Systems , volume =. 2025 , url =
2025
-
[31]
He, Yun and Li, Wenzhe and Zhang, Hejia and Li, Songlin and Mandyam, Karishma and Khosla, Sopan and Xiong, Yuanhao and Wang, Nanshu and Peng, Xiaoliang and Li, Beibin and Bi, Shengjie and Patil, Shishir G. and Qi, Qi and Feng, Shengyu and Katz-Samuels, Julian and Pang, Richard Yuanzhe and Gonugondla, Sujan and Lang, Hunter and Yu, Yue and Qian, Yundi and ...
-
[32]
Peng, Hao and Qi, Yunjia and Wang, Xiaozhi and Xu, Bin and Hou, Lei and Li, Juanzi , booktitle =. 2025 , address =. doi:10.18653/v1/2025.emnlp-main.1542 , url =
-
[33]
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems , author =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month = jul, year =. doi:10.18653/v1/2025.acl-long.775 , url =
-
[34]
2025 , url =
Liu, Yantao and Yao, Zijun and Min, Rui and Cao, Yixin and Hou, Lei and Li, Juanzi , booktitle =. 2025 , url =
2025
-
[35]
2025 , url =
Tan, Sijun and Zhuang, Siyuan and Montgomery, Kyle and Tang, William Yuan and Cuadron, Alejandro and Wang, Chenguang and Popa, Raluca and Stoica, Ion , booktitle =. 2025 , url =
2025
-
[36]
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
Wen, Bosi and Niu, Yilin and Wang, Cunxiang and Ling, Xiaoying and Zhang, Ying and Ke, Pei and Wang, Hongning and Huang, Minlie , year =. doi:10.48550/arXiv.2603.04738 , url =. 2603.04738 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.04738
-
[37]
Liu, Tianci and Xu, Ran and Yu, Tony and Hong, Ilgee and Yang, Carl and Zhao, Tuo and Wang, Haoyu , year =. doi:10.48550/arXiv.2510.07743 , url =. 2510.07743 , archivePrefix =
-
[38]
Xie, Lipeng and Huang, Sen and Zhang, Zhuo and Zou, Anni and Zhai, Yunpeng and Ren, Dingchao and Zhang, Kezun and Hu, Haoyuan and Liu, Boyin and Chen, Haoran and Liu, Zhaoyang and Ding, Bolin , year =. doi:10.48550/arXiv.2510.17314 , url =. 2510.17314 , archivePrefix =
-
[39]
Incentivizing Agentic Reasoning in
Xu, Ran and Chen, Jingjing and Ye, Jiayu and Wu, Yu and Yan, Jun and Yang, Carl and Yu, Hongkun , booktitle =. Incentivizing Agentic Reasoning in. 2026 , url =
2026
-
[40]
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward , author =. 2026 , eprint =. doi:10.48550/arXiv.2602.12430 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.12430 2026
-
[41]
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
Jiang, Yanna and Li, Delong and Deng, Haiyu and Ma, Baihe and Wang, Xu and Wang, Qin and Yu, Guangsheng , year =. doi:10.48550/arXiv.2602.20867 , url =. 2602.20867 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.20867
-
[42]
2025 , month = oct, howpublished =
Introducing. 2025 , month = oct, howpublished =
2025
-
[43]
2025 , month = oct, howpublished =
Equipping Agents for the Real World with Agent Skills , author =. 2025 , month = oct, howpublished =
2025
-
[44]
2025 , month = nov, howpublished =
2025
-
[45]
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Li, Xiangyi and Chen, Wenbo and Liu, Yimin and Zheng, Shenghan and Chen, Xiaokun and He, Yifeng and Li, Yubo and You, Bingran and Shen, Haotian and Sun, Jiankai and Wang, Shuyi and Li, Binxu and Zeng, Qunhong and Wang, Di and Zhao, Xuandong and Wang, Yuanli and Ben Chaim, Roey and Di, Zonglin and Gao, Yipeng and He, Junwei and He, Yizhuo and Jing, Liqiang...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.12670
-
[46]
Can External Validation Tools Improve Annotation Quality for
Findeis, Arduin and Weers, Floris and Yin, Guoli and Ye, Ke and Pang, Ruoming and Gunter, Tom , booktitle =. Can External Validation Tools Improve Annotation Quality for. 2025 , address =. doi:10.18653/v1/2025.acl-long.779 , url =
-
[47]
Advances in Neural Information Processing Systems , volume =
Reward Reasoning Models , author =. Advances in Neural Information Processing Systems , volume =. 2025 , url =
2025
-
[48]
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models , author =. 2026 , eprint =. doi:10.48550/arXiv.2602.04649 , url =
-
[49]
2021 , eprint =
Nakano, Reiichiro and Hilton, Jacob and Balaji, Suchir and Wu, Jeff and Ouyang, Long and Kim, Christina and Hesse, Christopher and Jain, Shantanu and Kosaraju, Vineet and Saunders, William and Jiang, Xu and Cobbe, Karl and Eloundou, Tyna and Krueger, Gretchen and Button, Kevin and Knight, Matthew and Chess, Benjamin and Schulman, John , journal =. 2021 , eprint =
2021
-
[50]
The Twelfth International Conference on Learning Representations , year =
Let's Verify Step by Step , author =. The Twelfth International Conference on Learning Representations , year =
-
[51]
Jiang, Yuxin and Wang, Yufei and Zeng, Xingshan and Zhong, Wanjun and Li, Liangyou and Mi, Fei and Shang, Lifeng and Jiang, Xin and Liu, Qun and Wang, Wei , booktitle =. 2024 , address =. doi:10.18653/v1/2024.acl-long.257 , url =
-
[52]
G -eval: NLG evaluation using gpt-4 with better human alignment
Liu, Yang and Iter, Dan and Xu, Yichong and Wang, Shuohang and Xu, Ruochen and Zhu, Chenguang , booktitle =. 2023 , address =. doi:10.18653/v1/2023.emnlp-main.153 , url =
-
[53]
Evaluating Judges as Evaluators: The
Zhou, Yilun and Xu, Austin and Wang, Peifeng and Xiong, Caiming and Joty, Shafiq , booktitle =. Evaluating Judges as Evaluators: The. 2025 , volume =
2025
-
[54]
and Yang, Jiangjiang and Le Bras, Ronan and Tafjord, Oyvind and Wilhelm, Christopher and Soldaini, Luca and Smith, Noah A
Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James Validad and Liu, Alisa and Dziri, Nouha and Lyu, Xinxi and Gu, Yuling and Malik, Saumya and Graf, Victoria and Hwang, Jena D. and Yang, Jiangjiang and Le Bras, Ronan and Tafjord, Oyvind and Wilhelm, Christopher and ...
2025
-
[55]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and Bikel, Dan and Blecher, Lukas and Canton Ferrer, Cristian and Chen, Moya and Cucurull, Guillem and Esiobu, David and Fernandes, Jude and Fu, Jeremy and Fu, Wenyi...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09288
-
[56]
2024 , volume =
Lee, Harrison and Phatale, Samrat and Mansoor, Hassan and Mesnard, Thomas and Ferret, Johan and Lu, Kellie Ren and Bishop, Colton and Hall, Ethan and Carbune, Victor and Rastogi, Abhinav and Prakash, Sushant , booktitle =. 2024 , volume =
2024
-
[57]
2024 , url =
Xu, Can and Sun, Qingfeng and Zheng, Kai and Geng, Xiubo and Zhao, Pu and Feng, Jiazhan and Tao, Chongyang and Lin, Qingwei and Jiang, Daxin , booktitle =. 2024 , url =
2024
-
[58]
Advances in Neural Information Processing Systems , volume =
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author =. Advances in Neural Information Processing Systems , volume =. 2022 , url =
2022
-
[59]
2021 , eprint =
Evaluating Large Language Models Trained on Code , author =. 2021 , eprint =
2021
-
[60]
Qin, Yujia and Liang, Shihao and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and Zhao, Sihan and Hong, Lauren and Tian, Runchu and Xie, Ruobing and Zhou, Jie and Gerstein, Mark and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , booktitle =. 2024 , url =. 2307.16789 , archivePrefix =
Pith/arXiv arXiv 2024
-
[61]
doi: 10.18653/v1/2023.emnlp-main.741
Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh , booktitle =. 2023 , address =. doi:10.18653/v1/2023.emnlp-main.741 , url =
-
[62]
Luong, Trung Quoc and Zhang, Xinbo and Jie, Zhanming and Sun, Peng and Jin, Xiaoran and Li, Hang , booktitle =. 2024 , address =. doi:10.18653/v1/2024.acl-long.410 , url =
-
[63]
Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , year =. doi:10.48550/arXiv.2402.03300 , url =. 2402.03300 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300
-
[64]
2025 , eprint =
Group Sequence Policy Optimization , author =. 2025 , eprint =
2025
-
[65]
Hashemi, Helia and Eisner, Jason and Rosset, Corby and. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month = aug, year =. doi:10.18653/v1/2024.acl-long.745 , url =
-
[66]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Wang, Guanzhi and Xie, Yuqi and Jiang, Yunfan and Mandlekar, Ajay and Xiao, Chaowei and Zhu, Yuke and Fan, Linxi and Anandkumar, Anima , year =. doi:10.48550/arXiv.2305.16291 , url =. 2305.16291 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291
-
[67]
Ling, George and Zhong, Shanshan and Huang, Richard , year =. Agent Skills: A Data-Driven Analysis of. doi:10.48550/arXiv.2602.08004 , url =. 2602.08004 , archivePrefix =
-
[68]
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications , author =. 2026 , eprint =. doi:10.48550/arXiv.2605.07358 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.07358 2026
-
[69]
arXiv preprint arXiv:2603.02176 , year=
Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale , author =. 2026 , eprint =. doi:10.48550/arXiv.2603.02176 , url =
-
[70]
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills , author =. 2026 , eprint =. doi:10.48550/arXiv.2604.24026 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.24026 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.