GTBench is a new curriculum-grounded benchmark showing GPT-5 performs strongly on basic graph theory tasks but all models, including it, struggle more on advanced proofs with notable evaluator disagreements.
Llm-hpc++: Evaluating llm-generated modern c++ and mpi+ openmp codes for scalable mandelbrot set computation.arXiv preprint arXiv:2512.17023, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph Theory
GTBench is a new curriculum-grounded benchmark showing GPT-5 performs strongly on basic graph theory tasks but all models, including it, struggle more on advanced proofs with notable evaluator disagreements.