{"total":20,"items":[{"citing_arxiv_id":"2605.23783","ref_index":16,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Benchmarking LLMs for Community Governance Simulation with Life-history Narratives","primary_cat":"cs.CY","submitted_at":"2026-05-22T15:48:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Presents resident narrative dataset, benchmarks 18 LLMs on life-history prompting, proposes curriculum-LoRA for low-cost personalization matching high-fidelity baselines, and integrates into closed-loop policy evaluation system.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22721","ref_index":19,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Self-Evolving Multi-Agent Systems via Decentralized Memory","primary_cat":"cs.MA","submitted_at":"2026-05-21T16:55:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DecentMem is a decentralized dual-pool memory framework for self-evolving multi-agent systems that provides O(log T) regret guarantees and yields up to 23.8% accuracy gains over centralized baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18729","ref_index":24,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction","primary_cat":"cs.RO","submitted_at":"2026-05-18T17:52:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Robo-Cortex proposes a self-evolving embodied navigation agent using dual-grain cognitive memory and autonomous knowledge induction from trajectories, reporting SPL gains on IGNav, AR, AEQA and preliminary real-robot tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18652","ref_index":43,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents","primary_cat":"cs.CV","submitted_at":"2026-05-18T16:57:36+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MementoGUI introduces a modular memory-control framework with working and episodic memory operators that improves long-horizon GUI agent performance over history-replay and text-only baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18597","ref_index":26,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Latent Action Reparameterization for Efficient Agent Inference","primary_cat":"cs.AI","submitted_at":"2026-05-18T16:07:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LAR learns a compact latent action space from trajectories that shortens the effective decision horizon for LLM agents, reducing token count and inference time while preserving task success.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17169","ref_index":51,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Responsible Agentic AI Requires Explicit Provenance","primary_cat":"cs.AI","submitted_at":"2026-05-16T21:56:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15710","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory","primary_cat":"cs.CL","submitted_at":"2026-05-15T08:00:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14563","ref_index":35,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation","primary_cat":"cs.SE","submitted_at":"2026-05-14T08:35:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Compass: Enhancing agent long-horizon reasoning with evolving context.arXiv preprint arXiv:2510.08790, 2025. [34] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [35] Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gon- zalez. Memgpt: Towards llms as operating systems.CoRR, abs/2310.08560, 2023. URL https://doi.org/10.48550/arXiv.2310.08560. [36] Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models."},{"citing_arxiv_id":"2605.13438","ref_index":44,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"CogniFold: Always-On Proactive Memory via Cognitive Folding","primary_cat":"cs.AI","submitted_at":"2026-05-13T12:34:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CogniFold extends Complementary Learning Systems theory to three layers with a prefrontal intent layer and uses graph self-organization to build proactive agent memory from continuous event streams.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12147","ref_index":31,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior","primary_cat":"cs.CR","submitted_at":"2026-05-12T14:05:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Washington Law Review, 79:119, 2004. [29] Patricia A Norberg, Daniel R Horne, and David A Horne. The privacy paradox: Personal information disclosure intentions versus behaviors.Journal of consumer affairs, 41(1):100-126, 2007. [30] NVIDIA. Nvidia nemotron 3: Efficient and open intelligence, 2025. URL https://arxiv. org/abs/2512.20856. White Paper. [31] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [32] Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel"},{"citing_arxiv_id":"2605.09330","ref_index":28,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory","primary_cat":"cs.LG","submitted_at":"2026-05-10T05:04:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Econometric Theory, 19(4):675-685, 2003. [26] Jihwan Oh, Minchan Jeong, Jongwoo Ko, and Se-Young Yun. Understanding bias reinforcement in llm agents debate.arXiv preprint arXiv:2503.16814, 2025. [27] Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonza- lez. Memgpt: towards llms as operating systems. 2023. 11 [28] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [29] Judea Pearl.The book of why: The new science of cause and effect."},{"citing_arxiv_id":"2605.09278","ref_index":46,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium","primary_cat":"cs.AI","submitted_at":"2026-05-10T03:04:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"multi-llm ensemble for accelerating drug repurposing in lung cancer via case report mining.npj Precision Oncology, 2026. [44] John F Nash Jr. Equilibrium points in n-person games.Proceedings of the national academy of sciences, 36(1):48-49, 1950. [45] Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonza- lez. Memgpt: towards llms as operating systems. 2023. 12 [46] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [47] David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia,"},{"citing_arxiv_id":"2605.10990","ref_index":22,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries","primary_cat":"cs.SE","submitted_at":"2026-05-09T11:41:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkillGuard extracts executable environment contracts from LLM skill documents to detect only relevant drifts, reporting zero false positives on 599 cases, 100% precision in known-drift tests, and raising one-round repair success from 10% to 78%.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[20] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534-46594, 2023. [21] Bertrand Meyer. Applying'design by contract'.Computer, 25(10):40-51, 2002. [22] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [23] Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton,"},{"citing_arxiv_id":"2605.08334","ref_index":29,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators","primary_cat":"cs.CL","submitted_at":"2026-05-08T17:59:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SalesSim benchmarks MLLMs as retail user simulators, finds gaps in persona adherence and over-persuasion, and introduces UserGRPO RL to raise decision alignment by 13.8%.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"incorporate fully expressive simulators [14, 4, 39, 42], these efforts primarily prioritize the assessment of the agentic systems themselves as opposed to the fidelity of the simulators. Recently, the field has taken note of the importance of fidelity of simulators, as highlighted by the corrections to Tau2-bench [5] during the Tau3-bench update (Sierra Engineering [29]), which fixed user persona underspecifications that introduced noise in Tau2-Bench. Research into user simulator fidelity is more established in domains outside of agentic evaluation. Naous et al. [23] evaluates and improves upon user simulators in the conversational assistant domain, while Gromada et al. [12] similarly evaluates human simulators in retail settings using LLM-as-a-"},{"citing_arxiv_id":"2604.26805","ref_index":42,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations","primary_cat":"cs.AI","submitted_at":"2026-04-29T15:35:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Our Flexible Skill mechanism addresses this by introducing an evolvable abstraction layer in which the data-routing logic itself is LLM-generated and incrementally updated through natural-language interfaces. On the knowledge side, prior work spans RAG [ 35] and its variants [ 36, 37], memory-augmented architectures [ 38, 39, 40, 41], and self-refinement methods [ 42, 43]. Recent work has further explored self-evolving memory systems that learn to update their own memory operations from feedback [44, 45, 46]. However, conventional RAG treats the knowledge base as static, and most self-evolving mechanisms operate along a single feedback pathway-improving either the knowledge store or the agent behavior, but not both."},{"citing_arxiv_id":"2604.14475","ref_index":16,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Evo-MedAgent: Beyond One-Shot Diagnosis with Agents That Remember, Reflect, and Improve","primary_cat":"cs.AI","submitted_at":"2026-04-15T23:12:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Evo-MedAgent adds three evolving memory stores to LLM agents for chest X-ray diagnosis, raising MCQ accuracy from 0.68 to 0.79 on GPT-5-mini and 0.76 to 0.87 on Gemini-3 Flash without any training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08362","ref_index":35,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces","primary_cat":"cs.CL","submitted_at":"2026-04-09T15:26:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces OmniBehavior benchmark from real-world data and shows LLMs exhibit hyper-activity, persona homogenization, and utopian bias in behavior simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08206","ref_index":13,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"\"Theater of Mind\" for LLMs: A Cognitive Architecture Based on Global Workspace Theory","primary_cat":"cs.MA","submitted_at":"2026-04-09T13:06:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Global Workspace Agents (GWA) is proposed as an active, event-driven cognitive architecture for LLMs featuring an entropy-based intrinsic drive and dual-layer memory to enable sustained self-directed agency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02674","ref_index":48,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems","primary_cat":"cs.MA","submitted_at":"2026-04-03T03:08:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The influence of scaffolds on coordination scaling laws in LLM agents. InNeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models (MTI-LLM), 2025. [46] C Packer, V Fang, SG Patil, K Lin, S Wooders, and J Gonzalez. Memgpt: Towards llms as operating systems. arxiv 2023.arXiv preprint arXiv:2310.08560. [47] Vilfredo Pareto.Cours d'économie politique, volume 1. Librairie Droz, 1964. [48] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [49] Thomas Piketty.Capital in the twenty-first century. Harvard University Press, 2014."},{"citing_arxiv_id":"2503.13657","ref_index":17,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Why Do Multi-Agent LLM Systems Fail?","primary_cat":"cs.AI","submitted_at":"2025-03-17T19:04:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"factuality and reasoning in language models through multiagent debate, 2023. URL https: //arxiv.org/abs/2305.14325. [16] Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1-22, 2023. [17] Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024. [18] Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. Agentless: Demystifying llm-based software engineering agents, 2024."}],"limit":50,"offset":0}