CHAMP: A competition-level dataset for fine-grained analyses of LLMs’ mathematical reasoning capabilities

Yujun Mao, Yoon Kim, Yilun Zhou · 2024 · DOI 10.18653/v1/2024.findings-acl.785

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

cs.AI · 2025-09-22 · unverdicted · novelty 7.0

EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

SKIM is an adaptive multi-resolution soft-token framework that compresses procedural skills while aiming to preserve logical dependencies and task performance better than prior compression methods.

citing papers explorer

Showing 2 of 2 citing papers.

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving cs.AI · 2025-09-22 · unverdicted · none · ref 31
EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.
Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models cs.CL · 2026-06-10 · unverdicted · none · ref 58
SKIM is an adaptive multi-resolution soft-token framework that compresses procedural skills while aiming to preserve logical dependencies and task performance better than prior compression methods.

CHAMP: A competition-level dataset for fine-grained analyses of LLMs’ mathematical reasoning capabilities

fields

years

verdicts

representative citing papers

citing papers explorer