pith. machine review for the scientific record. sign in

hub

Livebench: A challenging, contamination-free llm benchmark

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

hub tools

years

2026 12 2025 6

clear filters

representative citing papers

You Don't Need Public Tests to Generate Correct Code

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

Qwen3 Technical Report

cs.CL · 2025-05-14 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 6 of 6 citing papers after filters.