Example: The number of downloads of the program in the second month increased to 360 = <<360=180>>180

Redundant Output Error Definition: Randomly selects a sentence in the CoT, duplicates it, inserts the copy into a random position

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving

cs.AI · 2025-05-22 · unverdicted · novelty 7.0

SMART is a new four-dimension benchmark for LLM mathematical problem solving that reveals large capability differences across models and introduces an All-Pass Score to measure genuine problem-solving ability.

citing papers explorer

Showing 1 of 1 citing paper.

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving cs.AI · 2025-05-22 · unverdicted · none · ref 18
SMART is a new four-dimension benchmark for LLM mathematical problem solving that reveals large capability differences across models and introduces an All-Pass Score to measure genuine problem-solving ability.

Example: The number of downloads of the program in the second month increased to 360 = <<360=180>>180

fields

years

verdicts

representative citing papers

citing papers explorer