arXiv preprint arXiv:2503.00735 , year=

Toby Simonds, Akira Yoshiyama · 2025 · arXiv 2503.00735

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

TTRL gains are reinterpreted as mostly sharpening rather than learning, with an identified extinction window causing net corruption; TTRL-Guard mitigates via FRS, MPS, and RCSU for improved pass@1.

Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization

cs.LG · 2026-06-17 · unverdicted · novelty 6.0

Unsupervised rewards combining model uncertainty and semantic consistency allow protein language models to self-steer via SRO and BRO algorithms, outperforming DPO and KTO on out-of-distribution prompts while approaching oracle performance.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 264
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

arXiv preprint arXiv:2503.00735 , year=

fields

years

verdicts

representative citing papers

citing papers explorer