Title resolution pending

· 2025

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ABRA: Agent Benchmark for Radiology Applications

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.

Containment Verification: AI Safety Guarantees Independent of Alignment

cs.AI · 2026-05-09 · unverdicted · novelty 8.0

Containment verification proves that an agentic framework can enforce safety boundaries against any output from an unconstrained AI model by mechanized forward-simulation refinement in Dafny.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

ASIA: an Autonomous System Identification Agent

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

ASIA uses an LLM-based coding agent to autonomously perform system identification, tested empirically on two benchmarks while noting limitations in transparency and reproducibility.

Learning-Augmented Scalable Linear Assignment Problem Optimization via Neural Dual Warm-Starts

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

A lightweight neural dual predictor accelerates exact LAP solvers by over 2x on synthetic data and 1.25-1.5x on real MOT and LPT tasks while preserving full optimality and scaling to N=16384.

Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design

cs.MA · 2026-05-09 · unverdicted · novelty 7.0

External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.

Diagnosing Training Inference Mismatch in LLM Reinforcement Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Training-inference mismatch in separated rollout and optimization stages of LLM RL can independently cause training collapse.

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding

cs.DC · 2026-05-13 · unverdicted · novelty 6.0

PipeSD achieves 1.16x-2.16x speedup and 14.3%-25.3% lower energy use in cloud-edge LLM inference via token-batch pipeline scheduling optimized by dynamic programming and a Bayesian-optimized dual-threshold NAV trigger.

Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.

Metaphor Is Not All Attention Needs

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.

An Executable Benchmarking Suite for Tool-Using Agents

cs.SE · 2026-05-10 · unverdicted · novelty 5.0

The paper delivers a unified executable benchmarking suite for tool-using agents that enforces a shared evidence-admission contract across web, code, and micro-task environments.

From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability

cs.SE · 2026-05-10 · conditional · novelty 5.0

Software engineering is transitioning from code-centric authorship to intent-centric supervision of human-agent systems, where specification, verification, security, and governance become central.

citing papers explorer

Showing 12 of 12 citing papers.

ABRA: Agent Benchmark for Radiology Applications cs.CV · 2026-05-11 · unverdicted · none · ref 42
ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.
Containment Verification: AI Safety Guarantees Independent of Alignment cs.AI · 2026-05-09 · unverdicted · partial · ref 18
Containment verification proves that an agentic framework can enforce safety boundaries against any output from an unconstrained AI model by mechanized forward-simulation refinement in Dafny.
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 53
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
ASIA: an Autonomous System Identification Agent cs.AI · 2026-05-11 · unverdicted · none · ref 12
ASIA uses an LLM-based coding agent to autonomously perform system identification, tested empirically on two benchmarks while noting limitations in transparency and reproducibility.
Learning-Augmented Scalable Linear Assignment Problem Optimization via Neural Dual Warm-Starts cs.LG · 2026-05-10 · unverdicted · none · ref 8
A lightweight neural dual predictor accelerates exact LAP solvers by over 2x on synthetic data and 1.25-1.5x on real MOT and LPT tasks while preserving full optimality and scaling to N=16384.
Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design cs.MA · 2026-05-09 · unverdicted · none · ref 12
External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.
Diagnosing Training Inference Mismatch in LLM Reinforcement Learning cs.LG · 2026-05-14 · unverdicted · none · ref 13
Training-inference mismatch in separated rollout and optimization stages of LLM RL can independently cause training collapse.
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding cs.DC · 2026-05-13 · unverdicted · none · ref 12
PipeSD achieves 1.16x-2.16x speedup and 14.3%-25.3% lower energy use in cloud-edge LLM inference via token-batch pipeline scheduling optimized by dynamic programming and a Bayesian-optimized dual-threshold NAV trigger.
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs cs.LG · 2026-05-07 · unverdicted · none · ref 7
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
Metaphor Is Not All Attention Needs cs.CL · 2026-05-12 · unverdicted · none · ref 44
Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.
An Executable Benchmarking Suite for Tool-Using Agents cs.SE · 2026-05-10 · unverdicted · none · ref 14
The paper delivers a unified executable benchmarking suite for tool-using agents that enforces a shared evidence-admission contract across web, code, and micro-task environments.
From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability cs.SE · 2026-05-10 · conditional · none · ref 34
Software engineering is transitioning from code-centric authorship to intent-centric supervision of human-agent systems, where specification, verification, security, and governance become central.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer