{"total":13,"items":[{"citing_arxiv_id":"2605.20690","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Declarative Data Services: Structured Agentic Discovery for Composing Data Systems","primary_cat":"cs.AI","submitted_at":"2026-05-20T04:36:40+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20086","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"What Do Evolutionary Coding Agents Evolve?","primary_cat":"cs.NE","submitted_at":"2026-05-19T16:41:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19633","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"optimize_anything: A Universal API for Optimizing any Text Parameter","primary_cat":"cs.CL","submitted_at":"2026-05-19T10:18:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A universal LLM optimizer for text artifacts achieves SOTA results on six tasks including tripling ARC-AGI accuracy and cutting cloud costs by 40% via cross-task transfer and side information.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15026","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SemaTune: Semantic-Aware Online OS Tuning with Large Language Models","primary_cat":"cs.OS","submitted_at":"2026-05-14T16:25:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"I/O knobs as unrelated values, it reasons about them as parts of a joint system. This lets SemaTune remain effective in the regimes where current tuners struggle: when direct applica- tion metrics are unavailable, when the search space contains numerically valid but semantically dubious configurations, and when the control surface is large and highly coupled. Recent systems such as SchedCP [ 89], ADRS [ 18], DB- BERT [76], GPTuner [41, 57], 𝜆-Tune [32] have shown that LLMs can help improve systems by structuring, pruning, or guiding search, mostly in offline or controlled settings. But bringing that same semantic reasoning into a live tuner is much harder. First, strong reasoning models are slow and expensive. In an online tuner, that matters twice: (1) they"},{"citing_arxiv_id":"2605.07039","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents","primary_cat":"cs.LG","submitted_at":"2026-05-07T23:38:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Preprint. arXiv:2605.07039v1 [cs.LG] 7 May 2026 worth revisiting, and which directions remain novel relative to the evolving frontier. In recommender- system design [56, 55], MoE load balancing [1, 21], and protein fitness extrapolation [41], candidate directions may range from architectural changes and optimization choices to routing strategies [8], feature interactions [ 43], and sequence-level transformations [ 14]. Many such directions can be justified by generic LLM reasoning, but only a few produce measurable improvement after evalu- ation [22]. A fixed policy can condition on this history through context, but it does not internalize the resulting search feedback into stable decision preferences [2, 57, 25]."},{"citing_arxiv_id":"2605.06544","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure","primary_cat":"cs.DC","submitted_at":"2026-05-07T16:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CCL-Bench packages traces and metadata to compute detailed compute, memory, and communication efficiency metrics, surfacing performance insights unavailable from end-to-end benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25083","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization","primary_cat":"cs.AI","submitted_at":"2026-04-28T00:31:55+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"Prefetching exposes a broader design space, requiring policies to balance coverage, timeliness, aggressiveness, and bandwidth. We seed evolution with VA/AMPM Lite [ 17], a compact and competitive prefetcher, and report final performance relative to a no-prefetch baseline. We additionally compare against several published policies, including IP Stride [ 12], Next Line, SMS [41], IPCP [33], Pythia [5], Berti [30], and SPP [23]. Because prefetching and replacement target the same memory- system bottleneck, we use the same composite score during search: scorepref. =IPC×10000− LLC_misses 1000 (2) This objective rewards prefetchers that improve end-to-end per- formance while discouraging designs that increase traffic without"},{"citing_arxiv_id":"2604.11109","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search","primary_cat":"cs.DC","submitted_at":"2026-04-13T07:25:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"R^3 optimizes full scientific applications on GPUs better than tuning kernel parameters or compiler flags alone while running nearly an order of magnitude faster than modern evolutionary search methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"in language modeling capabilities via LLMs has made it possible to use evolutionary search over a text space. Recent works like FunSearch [10], AlphaEvolve [11], and ADRS [12] have shown the impressive capabilities of these systems in discovering new algorithms and systems optimizations. Many of these works, like AlphaEvolve [11] and its popular open-source reproduction OpenEvolve [13], are based on the MAP-Elites evolutionary algorithm[14]. Figure 1 presents an overview of the MAP-Elites evolutionary algorithm pow- ered by LLMs for code optimization. In the MAP-Elites algorithm, members of the population are mapped to coor- dinate cells in a cartesian grid based on \"feature dimensions\", i.e. each dimension of the grid corresponds to a feature of"},{"citing_arxiv_id":"2604.07144","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics","primary_cat":"cs.DC","submitted_at":"2026-04-08T14:37:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[23] Asankhaya Sharma. Openevolve: An open-source evolutionary coding agent, 2025. [24] Gang Liao, Hongsen Qin, Ying Wang, Alicia Golden, Michael Kuchnik, Yavuz Yetim, Jia Jiunn Ang, Chunli Fu, Yihan He, Samuel Hsia, et al. Kernelevolve: Scaling agentic kernel coding for heterogeneous ai accelerators at meta. arXiv preprint arXiv:2512.23236, 2025. [25] Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, et al. Barbarians at the gate: How ai is upending systems research. arXiv preprint arXiv:2510.06189, 2025. [26] Mohammad Shoeybi, Mostofa Patwary , Raul Puri, Patrick LeGresley , Jared Casper, and Bryan Catanzaro. Megatron- lm: Training multi-billion parameter language models using model parallelism."},{"citing_arxiv_id":"2604.06566","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AI-Driven Research for Databases","primary_cat":"cs.DB","submitted_at":"2026-04-08T01:34:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Bowen Wang, Alexander Krentsel, Tian Xia, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya A Agrawal, Ashwin Naren, Shulu Li, Ruiying Ma, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. 2025. Let the Barbarians In: How AI Can Accelerate Systems Performance Research.arXiv preprint arXiv:2512.14806(2025). https://doi.org/10.48550/arXiv.2512.14806 [12] Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, et al. 2025. Barbarians at the Gate: How AI is Upending Systems Research.arXiv preprint arXiv:2510.06189(2025). [13] Audrey Cheng, Xiao Shi, Aaron Kabcenell, Shilpa Lawande, Hamza Qadeer, Jason Chan, Harrison Tin, Ryan Zhao, Peter Bailis, Mahesh Balakrishnan, Nathan"},{"citing_arxiv_id":"2604.01621","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72","primary_cat":"cs.DC","submitted_at":"2026-04-02T05:00:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DWDP distributes MoE weights across GPUs for independent execution without collective synchronization, improving output TPS/GPU by 8.8 percent on GB200 NVL72 for DeepSeek-R1 under 8K input and 1K output lengths.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, and Thomas L. Griffiths. When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1.arXiv preprint arXiv:2410.01792, 2024. [3] Yu Gao, Haoyuan Guo, Tuyen Hoang, et al. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113, 2025. [4] Audrey Cheng, Shu Liu, Melissa Pan, et al. Barbarians at the gate: How AI is upending systems research. arXiv preprint arXiv:2510.06189, 2025. [5] DeepSeek-AI, Aixin Liu, Bei Feng, et al. DeepSeek-V3 technical report.arXiv preprint arXiv:2412.19437, 2024. [6] Moonshot AI. Kimi-K2.5. https://huggingface.co/moonshotai/Kimi-K2.5, 2026. Accessed: March 30,"},{"citing_arxiv_id":"2604.03312","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World","primary_cat":"cs.AR","submitted_at":"2026-03-31T22:00:47+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Automated architectural discovery engines can outperform human design teams by exploring massive design spaces and compressing development cycles from months to weeks.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"thatunderpinhardware[8]. IfAIcanoptimizethecomparatorcircuitsinasortingnetworkbetterthanhumanheuristics, theleaptooptimizinganALUorNoCrouteristrivial. Above Architecture (Systems & Software):The systems research community has acknowledged the \"Barbarians at the Gate,\" conceding that the era of manually tuning heuristics for operating systems and datacenters is over [9]. If general-purposesearchcanoptimizethemessy,noise-filledenvironmentofacloudscheduler[9],itcancertainlyoptimize themicroarchitecturethatrunsit. Parallel to Architecture (Network Policy):In networking, thePolicySmithframework proves that complex de- signtasksrequiringstrictcorrectnesscanbesynthesizedbyGenAIandverifiedformally,outperforminghumanmanual"},{"citing_arxiv_id":"2602.21480","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Both Ends Count! Just How Good are LLM Agents at \"Text-to-Big SQL\"?","primary_cat":"cs.DB","submitted_at":"2026-02-25T01:12:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"New Text-to-Big SQL metrics show that LLM agents must balance accuracy with cost and speed at scale, where GPT-4o trades some accuracy for up to 12x speedup and GPT-5.2 proves more cost-effective than Gemini 3 Pro on large inputs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}