hub

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G · 2024 · DOI 10.52202/079017-4020

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open at publisher browse 12 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Agents for Experiments, Experiments for Agents: A Design Grammar for AI-Enabled Experimental Science

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

SEED is a structural encoding framework using typed actor-flow graphs to describe, evaluate novelty of, and generate experimental designs for AI-enabled science under feasibility and governance constraints.

SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

SkillOps maintains LLM skill libraries via Skill Contracts and ecosystem graphs, raising ALFWorld task success to 79.5% as a standalone agent and improving retrieval baselines by up to 2.9 points with near-zero library-time LLM cost.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

MemoRepair formalizes the cascade update problem in agentic memory and solves it via a min-cut reduction that eliminates invalidated memory exposure to 0% while recovering 91-94% of valid successors at 57-76% of baseline repair cost.

Switchcraft: AI Model Router for Agentic Tool Calling

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

Switchcraft routes agentic tool-calling queries to the lowest-cost model that preserves correctness, reaching 82.9% accuracy and 84% cost reduction on five benchmarks.

BIM Information Extraction Through LLM-based Adaptive Exploration

cs.CL · 2026-05-03 · unverdicted · novelty 7.0

LLM adaptive exploration via runtime code execution outperforms static query generation for information extraction from heterogeneous BIM models on the new ifc-bench v2 benchmark.

Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.

Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

A structured JSON intermediate representation for LLM-generated static analysis queries outperforms both direct generation and agentic tool use, with gains of 15-25 percentage points on large models.

Querying Structured Data Through Natural Language Using Language Models

cs.CL · 2026-04-03 · conditional · novelty 6.0

Fine-tuning an 8B LLM with synthetic data enables accurate natural language querying of structured datasets like accessibility services in Spain, generalizing to new locations.

Tracking Capabilities for Safer Agents

cs.AI · 2026-03-01 · unverdicted · novelty 6.0

AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.

ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents

cs.AI · 2026-04-28 · unverdicted · novelty 5.0

ADEMA is a knowledge-state orchestration architecture for LLM agents that uses explicit epistemic bookkeeping, checkpoint-resumable persistence, and artifact-first assembly to support reliable long-horizon knowledge synthesis.

Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents

cs.AI · 2026-04-13 · unverdicted · novelty 5.0

Orchestrating one 8B model in three roles at inference time doubles task completion on AppWorld from 5.4% to 8.9%, surpassing a 33B baseline.

citing papers explorer

Showing 12 of 12 citing papers.

Agents for Experiments, Experiments for Agents: A Design Grammar for AI-Enabled Experimental Science cs.AI · 2026-05-18 · unverdicted · none · ref 41
SEED is a structural encoding framework using typed actor-flow graphs to describe, evaluate novelty of, and generate experimental designs for AI-enabled science under feasibility and governance constraints.
SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems cs.SE · 2026-05-13 · unverdicted · none · ref 40
SkillOps maintains LLM skill libraries via Skill Contracts and ecosystem graphs, raising ALFWorld task success to 79.5% as a standalone agent and improving retrieval baselines by up to 2.9 points with near-zero library-time LLM cost.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 116
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory cs.AI · 2026-05-08 · unverdicted · none · ref 14
MemoRepair formalizes the cascade update problem in agentic memory and solves it via a min-cut reduction that eliminates invalidated memory exposure to 0% while recovering 91-94% of valid successors at 57-76% of baseline repair cost.
Switchcraft: AI Model Router for Agentic Tool Calling cs.AI · 2026-05-08 · unverdicted · none · ref 25
Switchcraft routes agentic tool-calling queries to the lowest-cost model that preserves correctness, reaching 82.9% accuracy and 84% cost reduction on five benchmarks.
BIM Information Extraction Through LLM-based Adaptive Exploration cs.CL · 2026-05-03 · unverdicted · none · ref 41
LLM adaptive exploration via runtime code execution outperforms static query generation for information extraction from heterogeneous BIM models on the new ifc-bench v2 benchmark.
Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries cs.CL · 2026-05-07 · unverdicted · none · ref 3
GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.
Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis cs.SE · 2026-04-23 · unverdicted · none · ref 20
A structured JSON intermediate representation for LLM-generated static analysis queries outperforms both direct generation and agentic tool use, with gains of 15-25 percentage points on large models.
Querying Structured Data Through Natural Language Using Language Models cs.CL · 2026-04-03 · conditional · none · ref 13
Fine-tuning an 8B LLM with synthetic data enables accurate natural language querying of structured datasets like accessibility services in Spain, generalizing to new locations.
Tracking Capabilities for Safer Agents cs.AI · 2026-03-01 · unverdicted · none · ref 59
AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.
ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents cs.AI · 2026-04-28 · unverdicted · none · ref 9
ADEMA is a knowledge-state orchestration architecture for LLM agents that uses explicit epistemic bookkeeping, checkpoint-resumable persistence, and artifact-first assembly to support reliable long-horizon knowledge synthesis.
Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents cs.AI · 2026-04-13 · unverdicted · none · ref 10
Orchestrating one 8B model in three roles at inference time doubles task completion on AppWorld from 5.4% to 8.9%, surpassing a 33B baseline.

Patil, Tianjun Zhang, Xin Wang, and Joseph E

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer