pith. sign in

hub

S1: Simple test-time scaling

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 13 2025 1

roles

background 1

polarities

background 1

representative citing papers

ATLAS: Agentic Test-time Learning-to-Allocate Scaling

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

ATLAS introduces an LLM-orchestrated agentic framework for dynamic test-time scaling via extensible 'explore' actions, achieving higher accuracy with fewer API calls than fixed-workflow baselines on four benchmarks.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

cs.AI · 2025-09-29 · conditional · novelty 7.0

ReasoningBank distills generalizable reasoning strategies from agent successes and failures to enable self-evolution, with memory-aware test-time scaling amplifying gains over raw-trajectory or success-only memory on web and software benchmarks.

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Bucket-Level MOO reformulates multilingual fine-tuning as localized multi-objective optimization and proves it enforces a tighter Pareto stationarity condition while improving cross-lingual performance on four LLMs.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

Verifier-Guided Code Translation via Meta-Step Decoding

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

Decoding Time Verification (DTV) interleaves verifier calls at structural boundaries during autoregressive code generation for C-to-Rust and JavaScript-to-TypeScript translation, raising pass rates while using fewer tokens than post-hoc baselines.

Reliable Chain-of-Thought via Prefix Consistency

stat.ML · 2026-05-08 · unverdicted · novelty 6.0

Prefix consistency weights CoT answers by their regeneration frequency from truncated prefixes and reaches standard self-consistency accuracy at a median 4.6x fewer tokens across five models and four benchmarks.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

citing papers explorer

Showing 14 of 14 citing papers.