pith. sign in

The Twelfth International Conference on Learning Representations , year=

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

years

2026 6 2024 1

roles

dataset 1

polarities

use dataset 1

representative citing papers

ABRA: Agent Benchmark for Radiology Applications

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

HINT-SD improves long-horizon LLM agent training by using hindsight to target self-distillation on failure-relevant action spans, delivering up to 18.8% higher performance and 2.26x lower time per step than dense per-turn feedback.

Computer Use at the Edge of the Statistical Precipice

cs.SE · 2026-05-07 · unverdicted · novelty 6.0

A blind replay script matches frontier model performance on static CUA benchmarks due to non-principled environments and evaluation methods, prompting PRISM design principles and the DigiWorld benchmark with improved statistical aggregation.

Agent Workflow Memory

cs.CL · 2024-09-11 · unverdicted · novelty 6.0

AWM induces reusable workflows from agent experiences and provides them selectively to improve success rates by 24.6% on Mind2Web and 51.1% on WebArena while reducing steps taken.

citing papers explorer

Showing 7 of 7 citing papers.

  • ABRA: Agent Benchmark for Radiology Applications cs.CV · 2026-05-11 · unverdicted · none · ref 10

    ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.

  • DocOS: Towards Proactive Document-Guided Actions in GUI Agents cs.AI · 2026-05-18 · unverdicted · none · ref 16

    Introduces DocOS benchmark to test GUI agents on proactively locating, comprehending, and executing instructions from online documentation in interactive web settings.

  • HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents cs.LG · 2026-05-18 · unverdicted · none · ref 3

    HINT-SD improves long-horizon LLM agent training by using hindsight to target self-distillation on failure-relevant action spans, delivering up to 18.8% higher performance and 2.26x lower time per step than dense per-turn feedback.

  • ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents cs.CR · 2026-05-17 · conditional · none · ref 11

    Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.

  • Computer Use at the Edge of the Statistical Precipice cs.SE · 2026-05-07 · unverdicted · none · ref 22

    A blind replay script matches frontier model performance on static CUA benchmarks due to non-principled environments and evaluation methods, prompting PRISM design principles and the DigiWorld benchmark with improved statistical aggregation.

  • Mango: Multi-Agent Web Navigation via Global-View Optimization cs.CL · 2026-04-20 · unverdicted · none · ref 30

    Mango raises web agent success rates to 63.6% on WebVoyager and 52.5% on WebWalkerQA by bandit-based starting-point selection and memory, beating baselines by 7.3% and 26.8%.

  • Agent Workflow Memory cs.CL · 2024-09-11 · unverdicted · none · ref 4

    AWM induces reusable workflows from agent experiences and provides them selectively to improve success rates by 24.6% on Mind2Web and 51.1% on WebArena while reducing steps taken.