Title resolution pending

Training Verifiers to Solve Math Word Problems , author= · 2021

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding

cs.DC · 2026-05-13 · unverdicted · novelty 6.0

PipeSD achieves 1.16x-2.16x speedup and 14.3%-25.3% lower energy use in cloud-edge LLM inference via token-batch pipeline scheduling optimized by dynamic programming and a Bayesian-optimized dual-threshold NAV trigger.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

VIGOR assigns higher rewards to LLM completions that produce smaller l2 norms of teacher-forced negative log-likelihood gradients, with sqrt(T) length correction and group ranking, yielding +3.31% math and +1.91% code gains over RLIF on Qwen2.5-7B.

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Pruning pretrained MoE models outperforms training from scratch, different compression methods converge after continued pretraining, and combining KD with language modeling loss plus progressive schedules yields a competitive 23A2B model from Qwen3-Next-80A3B.

Geometry Guided Self-Consistency for Physical AI

cs.RO · 2026-05-09 · unverdicted · novelty 6.0

KeyStone improves task success rates in diffusion-based physical AI models by up to 13.3% by sampling K trajectories in parallel, clustering them in action space, and returning the medoid of the largest cluster.

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

cs.CL · 2026-05-12

citing papers explorer

Showing 11 of 11 citing papers.

GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 36
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding cs.DC · 2026-05-13 · unverdicted · none · ref 39
PipeSD achieves 1.16x-2.16x speedup and 14.3%-25.3% lower energy use in cloud-edge LLM inference via token-batch pipeline scheduling optimized by dynamic programming and a Bayesian-optimized dual-threshold NAV trigger.
Search Your Block Floating Point Scales! cs.LG · 2026-05-12 · unverdicted · none · ref 99
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward cs.LG · 2026-05-11 · unverdicted · none · ref 20
VIGOR assigns higher rewards to LLM completions that produce smaller l2 norms of teacher-forced negative log-likelihood gradients, with sqrt(T) length correction and group ranking, yielding +3.31% math and +1.91% code gains over RLIF on Qwen2.5-7B.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training cs.LG · 2026-05-09 · unverdicted · none · ref 3
Pruning pretrained MoE models outperforms training from scratch, different compression methods converge after continued pretraining, and combining KD with language modeling loss plus progressive schedules yields a competitive 23A2B model from Qwen3-Next-80A3B.
Geometry Guided Self-Consistency for Physical AI cs.RO · 2026-05-09 · unverdicted · none · ref 64
KeyStone improves task success rates in diffusion-based physical AI models by up to 13.3% by sampling K trajectories in parallel, clustering them in action space, and returning the medoid of the largest cluster.
The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling cs.AI · 2026-05-04 · unverdicted · none · ref 6
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 107
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 13
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 18
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching cs.CL · 2026-05-12 · unreviewed · ref 24

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer