Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.
Trajectory-informed memory generation for self-improving agent systems
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 1polarities
background 1representative citing papers
Retrieving structured thinking traces as a corpus improves reasoning performance on AIME, LiveCodeBench, and GPQA over standard RAG or no retrieval.
AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.
Metis combines text and code memory hierarchically for self-evolving agents, claiming up to 20.6% higher accuracy and 22.8% lower cost than ReAct on the AppWorld benchmark.
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
ConMem distills agent trajectories into structured memory cards organized in a relation-aware graph to enable training-free, relation-coordinated adaptation in LLM-based multi-agent systems.
HarnessFix diagnoses harness flaws from agent traces via HTIR, maps them to repair operators, and improves benchmark performance by 6.3-18.4% over baselines.
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
citing papers explorer
-
Co-Evolving Skill Generation and Policy Optimization
Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.
-
RAG over Thinking Traces Can Improve Reasoning Tasks
Retrieving structured thinking traces as a corpus improves reasoning performance on AIME, LiveCodeBench, and GPQA over standard RAG or no retrieval.
-
Metis: Bridging Text and Code Memory for Self-Evolving Agents
Metis combines text and code memory hierarchically for self-evolving agents, claiming up to 20.6% higher accuracy and 22.8% lower cost than ReAct on the AppWorld benchmark.
-
Security Considerations for Multi-agent Systems
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
-
ConMem: Structured Memory-Guided Adaptation in Training-Free Multi-Agent Systems
ConMem distills agent trajectories into structured memory cards organized in a relation-aware graph to enable training-free, relation-coordinated adaptation in LLM-based multi-agent systems.
-
From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws
HarnessFix diagnoses harness flaws from agent traces via HTIR, maps them to repair operators, and improves benchmark performance by 6.3-18.4% over baselines.
-
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.