CodegenBench shows LLMs generate optimized code well for x86_64 but exhibit significant performance degradation on Sunway and Kunpeng due to limited documentation and training data.
Ai governance and accountability: An analysis of anthropic’s claude.arXiv preprint arXiv:2407.01557, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces ProcedureVQA benchmark and Chain-of-Procedure framework that improves VLM next-step prediction in procedures by up to 13% over baselines.
citing papers explorer
-
CodegenBench: Can LLMs Write Efficient Code Across Architectures?
CodegenBench shows LLMs generate optimized code well for x86_64 but exhibit significant performance degradation on Sunway and Kunpeng due to limited documentation and training data.
-
Chain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QA
Introduces ProcedureVQA benchmark and Chain-of-Procedure framework that improves VLM next-step prediction in procedures by up to 13% over baselines.