Cmath: Can your language model pass chinese elementary school math test?arXiv preprint arXiv:2306.16636

Wei, T · 2023 · arXiv 2306.16636

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.

Validity-Calibrated Reasoning Distillation

cs.LG · 2026-04-14 · unverdicted · novelty 7.0 · 2 refs

Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.

Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

cs.LG · 2026-03-09 · unverdicted · novelty 6.0

CAMEL is a scaling law capturing nonlinear model-size and mixture interactions to extrapolate optimal data mixtures for large LLMs from small-model experiments, reducing optimization cost by 50% and improving benchmarks by up to 3%.

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

cs.LG · 2025-12-10 · conditional · novelty 6.0

LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

AGPO adaptively sets trust-region size and exploration temperature from group reward dispersion, entropy, and KL drift, yielding higher scores than PPO and GRPO on nine math benchmarks under fixed token budget.

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

cs.CL · 2026-01-22 · unverdicted · novelty 5.0

MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.

Kimi K2: Open Agentic Intelligence

cs.LG · 2025-07-28 · unverdicted · novelty 5.0

Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

citing papers explorer

Showing 7 of 7 citing papers.

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 26
Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.
Validity-Calibrated Reasoning Distillation cs.LG · 2026-04-14 · unverdicted · none · ref 25 · 2 links
Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.
Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization cs.LG · 2026-03-09 · unverdicted · none · ref 29
CAMEL is a scaling law capturing nonlinear model-size and mixture interactions to extrapolate optimal data mixtures for large LLMs from small-model experiments, reducing optimization cost by 50% and improving benchmarks by up to 3%.
LLaDA2.0: Scaling Up Diffusion Language Models to 100B cs.LG · 2025-12-10 · conditional · none · ref 36
LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback cs.LG · 2026-05-20 · unverdicted · none · ref 18
AGPO adaptively sets trust-region size and exploration temperature from group reward dispersion, entropy, and KL drift, yielding higher scores than PPO and GRPO on nine math benchmarks under fixed token budget.
Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow cs.CL · 2026-01-22 · unverdicted · none · ref 8
MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.
Kimi K2: Open Agentic Intelligence cs.LG · 2025-07-28 · unverdicted · none · ref 84
Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

Cmath: Can your language model pass chinese elementary school math test?arXiv preprint arXiv:2306.16636

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer