SkillSmith is a boundary-first compiler-runtime system that turns skill packages into minimal executable interfaces, cutting token usage 57%, thinking iterations 43%, and solve time 51% versus raw skill injection on SkillsBench.
hub Mixed citations
Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551
Mixed citation behavior. Most common role is background (62%).
hub tools
citation-role summary
citation-polarity summary
years
2026 13representative citing papers
LEAD uses online adaptive mechanisms including Potential-Scaled Instability and symmetric efficiency rewards based on correct rollouts to achieve higher accuracy-efficiency scores with substantially shorter reasoning outputs than base models on math benchmarks.
AgentForesight introduces an online auditor model that predicts decisive errors in multi-agent trajectories at the earliest step using a coarse-to-fine reinforcement learning recipe on a new curated dataset AFTraj-2K.
SkVM uses capability profiling and compiler-style techniques to make skills portable across LLMs and harnesses, raising task completion rates while cutting token use by up to 40% and delivering up to 3.2x speedup.
RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
WebGen-R1 uses end-to-end RL with scaffold-driven generation and cascaded rewards for structure, function, and aesthetics to transform a 7B model into a generator of deployable multi-page websites that rivals much larger models.
EligMeta automates trial discovery from registries and incorporates eligibility similarity into meta-analysis weighting to yield population-aligned pooled estimates, as shown by recovering all guideline trials in one case and shifting a risk ratio in another.
Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.
Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.
citing papers explorer
-
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.