pith. sign in

Title resolution pending

36 Pith papers cite this work. Polarity classification is still indexing.

36 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 35 2025 1

roles

background 1

polarities

background 1

representative citing papers

ABRA: Agent Benchmark for Radiology Applications

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.

CrackMeBench: Binary Reverse Engineering for Agents

cs.SE · 2026-05-11 · accept · novelty 7.0

CrackMeBench introduces 20 deterministic binary validation tasks and reports GPT-5.5 solving 11/12 generated ones at pass@3 while Claude and Kimi lag, especially on harder tasks.

Quantum Advantage for Coordinated Frequency Selection Against Distributed Jammers

quant-ph · 2026-04-22 · unverdicted · novelty 7.0

Entangled states of local dimension d enable strictly higher probability of agreeing on a common frequency band than the optimal classical strategy for sufficiently large safe bands d and spectrum size n, with an explicit 5.4% asymptotic advantage for d=2 using one Bell pair.

Design and Report Benchmarks for Knowledge Work

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.

Muon Does Not Converge on Convex Lipschitz Functions

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

Evaluating AI Providers' Frontier Safety Frameworks

cs.CY · 2025-12-01 · unverdicted · novelty 6.0

Twelve frontier AI safety frameworks score between 8% and 34% on adapted risk-management criteria, with a median of 18%, leaving them too vague to serve as reliable external accountability mechanisms.

citing papers explorer

Showing 36 of 36 citing papers.