American invitational mathematics examination (aime) 2025

Yifan Zhang, Team Math-AI · 2025

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A new RL objective adapts trust-region and off-policy handling automatically via normalized effective sample size of batch policy ratios, matching tuned baselines without new hyperparameters.

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

cs.CL · 2026-04-20 · unverdicted · novelty 6.0 · 2 refs

GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.

RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning

cs.AI · 2026-04-17 · unverdicted · novelty 5.0

RankGuide uses tensor-rank analysis of consecutive hidden states to route between small and large reasoning models and steer generations, reducing latency up to 1.75x while maintaining competitive accuracy on reasoning benchmarks.

citing papers explorer

Showing 4 of 4 citing papers.

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills cs.CL · 2026-05-20 · unverdicted · none · ref 45
Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.
Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training cs.LG · 2026-05-12 · unverdicted · none · ref 36
A new RL objective adapts trust-region and off-policy handling automatically via normalized effective sample size of batch policy ratios, matching tuned baselines without new hyperparameters.
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling cs.CL · 2026-04-20 · unverdicted · none · ref 35 · 2 links
GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.
RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning cs.AI · 2026-04-17 · unverdicted · none · ref 10
RankGuide uses tensor-rank analysis of consecutive hidden states to route between small and large reasoning models and steer generations, reducing latency up to 1.75x while maintaining competitive accuracy on reasoning benchmarks.

American invitational mathematics examination (aime) 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer