Mathematical capabilities of chatgpt

Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Petersen, Julius Berner · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

cs.AI · 2025-05-29 · unverdicted · novelty 7.0

MathArena evaluates over 50 LLMs on 162 fresh competition problems across seven contests, detects contamination in AIME 2024, and reports top models scoring below 40 percent on IMO 2025 proof tasks.

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

cs.CL · 2026-05-01 · unverdicted · novelty 5.0 · 2 refs

MathArena is broadened into a maintained platform with new benchmarks for proofs, research questions, and formal verification, where GPT-5.5 scores 98% on 2026 USAMO and 74% on research-level tasks.

citing papers explorer

Showing 2 of 2 citing papers.

MathArena: Evaluating LLMs on Uncontaminated Math Competitions cs.AI · 2025-05-29 · unverdicted · none · ref 13
MathArena evaluates over 50 LLMs on 162 fresh competition problems across seven contests, detects contamination in AIME 2024, and reports top models scoring below 40 percent on IMO 2025 proof tasks.
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs cs.CL · 2026-05-01 · unverdicted · none · ref 30 · 2 links
MathArena is broadened into a maintained platform with new benchmarks for proofs, research questions, and formal verification, where GPT-5.5 scores 98% on 2026 USAMO and 74% on research-level tasks.

Mathematical capabilities of chatgpt

fields

years

verdicts

representative citing papers

citing papers explorer