{"total":12,"items":[{"citing_arxiv_id":"2606.29194","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution","primary_cat":"cs.AI","submitted_at":"2026-06-28T04:41:00+00:00","verdict":"REJECT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01444","ref_index":65,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence","primary_cat":"cs.AI","submitted_at":"2026-05-31T20:29:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A category-theoretic model frames scientific discovery as verified regime transitions via left Kan extensions that preserve and compare artifacts across schema changes in agentic AI.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26448","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure","primary_cat":"cs.MA","submitted_at":"2026-05-26T02:01:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Adversarial co-evolution of LLM constitutions in public goods games reaches near-parity equilibrium only when fitness is coupled across factions and evaluation uses at least five seeds per generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23372","ref_index":48,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Curriculum reinforcement learning with measurable task representation learning","primary_cat":"cs.LG","submitted_at":"2026-05-22T08:36:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A VAE-based latent task representation enables automatic curriculum generation in CRL for non-Euclidean navigation tasks, outperforming interpolation and GAN-based methods in experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14445","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale","primary_cat":"cs.LG","submitted_at":"2026-05-14T06:39:42+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14392","ref_index":36,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis","primary_cat":"cs.AI","submitted_at":"2026-05-14T05:14:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EvoEnv lets a single policy synthesize, validate, and use Python environments with durable solve-verify asymmetry to improve reasoning performance on Qwen3-4B-Thinking from 72.4 to 74.8 while fixed-data baselines decline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14350","ref_index":287,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling","primary_cat":"cs.LG","submitted_at":"2026-05-14T04:22:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01358","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PACE: Parameter Change for Unsupervised Environment Design","primary_cat":"cs.LG","submitted_at":"2026-05-02T10:07:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PACE uses the squared L2 norm of policy parameter changes from a first-order approximation as an efficient proxy for environment value in UED, outperforming baselines with higher IQM and lower optimality gap on MiniGrid and Craftax OOD tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.01665","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning","primary_cat":"cs.MA","submitted_at":"2026-02-02T05:34:38+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.18203","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SkillWrapper: Generative Predicate Invention for Task-level Robot Planning","primary_cat":"cs.RO","submitted_at":"2025-11-22T22:25:11+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.16291","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","primary_cat":"cs.AI","submitted_at":"2023-05-25T17:46:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more unique items and 15.3x faster milestone unlocks than prior methods while generalizing技能","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. arXiv preprint arXiv: Arxiv- 2107.03374, 2021. [42] Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O. Stanley. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv: Arxiv-1901.01753, 2019. [43] Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. Auto- matic curriculum learning for deep RL: A short survey."},{"citing_arxiv_id":"1911.01547","ref_index":96,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"On the Measure of Intelligence","primary_cat":"cs.AI","submitted_at":"2019-11-05T00:31:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Intelligence is skill-acquisition efficiency, and the ARC benchmark measures human-like general fluid intelligence by testing abstraction and reasoning with minimal, innate-like priors.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"given student (tasks should be new and challenging, while still being solvable by the stu- dent), while students would evolve to learn to solve increasingly difﬁcult tasks. This setup is also favorable to curriculum optimization, as the teacher program may be conﬁgured to seek to optimize the learning efﬁciency of its students. This idea is similar to the \"anytime intelligence test\" proposed in [38] and to the POET system proposed in [96]. In order to make sure that the space of generated tasks retains sufﬁcient complexity and novelty over time, the teacher program should draw information from an external source (assumed to feature incompressible complexity), such as the real world. This external source of complexity makes the setup truly open-ended. A teacher program that generates"}],"limit":50,"offset":0}