pith. sign in

hub

S1: Simple test-time scaling

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 17 2025 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

ATLAS: Agentic Test-time Learning-to-Allocate Scaling

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

ATLAS introduces an LLM-orchestrated agentic framework for dynamic test-time scaling via extensible 'explore' actions, achieving higher accuracy with fewer API calls than fixed-workflow baselines on four benchmarks.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

cs.AI · 2025-09-29 · conditional · novelty 7.0

ReasoningBank distills generalizable reasoning strategies from agent successes and failures to enable self-evolution, with memory-aware test-time scaling amplifying gains over raw-trajectory or success-only memory on web and software benchmarks.

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Bucket-Level MOO reformulates multilingual fine-tuning as localized multi-objective optimization and proves it enforces a tighter Pareto stationarity condition while improving cross-lingual performance on four LLMs.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

Verifier-Guided Code Translation via Meta-Step Decoding

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

Decoding Time Verification (DTV) interleaves verifier calls at structural boundaries during autoregressive code generation for C-to-Rust and JavaScript-to-TypeScript translation, raising pass rates while using fewer tokens than post-hoc baselines.

Reliable Chain-of-Thought via Prefix Consistency

stat.ML · 2026-05-08 · unverdicted · novelty 6.0

Prefix consistency weights CoT answers by their regeneration frequency from truncated prefixes and reaches standard self-consistency accuracy at a median 4.6x fewer tokens across five models and four benchmarks.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Reliable Chain-of-Thought via Prefix Consistency stat.ML · 2026-05-08 · unverdicted · none · ref 3

    Prefix consistency weights CoT answers by their regeneration frequency from truncated prefixes and reaches standard self-consistency accuracy at a median 4.6x fewer tokens across five models and four benchmarks.