SkillSmith is a boundary-first compiler-runtime system that turns skill packages into minimal executable interfaces, cutting token usage 57%, thinking iterations 43%, and solve time 51% versus raw skill injection on SkillsBench.
hub Mixed citations
Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551
Mixed citation behavior. Most common role is background (62%).
hub tools
citation-role summary
citation-polarity summary
years
2026 13representative citing papers
LEAD uses online adaptive mechanisms including Potential-Scaled Instability and symmetric efficiency rewards based on correct rollouts to achieve higher accuracy-efficiency scores with substantially shorter reasoning outputs than base models on math benchmarks.
AgentForesight introduces an online auditor model that predicts decisive errors in multi-agent trajectories at the earliest step using a coarse-to-fine reinforcement learning recipe on a new curated dataset AFTraj-2K.
SkVM uses capability profiling and compiler-style techniques to make skills portable across LLMs and harnesses, raising task completion rates while cutting token use by up to 40% and delivering up to 3.2x speedup.
RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
WebGen-R1 uses end-to-end RL with scaffold-driven generation and cascaded rewards for structure, function, and aesthetics to transform a 7B model into a generator of deployable multi-page websites that rivals much larger models.
EligMeta automates trial discovery from registries and incorporates eligibility similarity into meta-analysis weighting to yield population-aligned pooled estimates, as shown by recovering all guideline trials in one case and shifting a risk ratio in another.
Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.
Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.
citing papers explorer
No citing papers match the current filters.