pith. sign in

hub Canonical reference

Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114

Canonical reference. 71% of citing Pith papers cite this work as background.

17 Pith papers citing it
Background 71% of classified citations

hub tools

citation-role summary

background 5 dataset 2

citation-polarity summary

years

2026 16 2025 1

clear filters

representative citing papers

SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval

cs.AI · 2026-05-21 · conditional · novelty 7.0

SGR-Bench evaluates agentic LLM systems on state-gated retrieval tasks where evidence is only accessible after configuring site-specific states, with the strongest system reaching 66.18% item-level F1 and failures dominated by retrieval-scope drift.

Learning Agentic Policy from Action Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

Web Agents Should Adopt the Plan-Then-Execute Paradigm

cs.CR · 2026-05-14 · unverdicted · novelty 6.0

Web agents should default to planning a complete task program before observing live web content to reduce prompt injection exposure, since WebArena tasks are compatible and 80% need no runtime LLM calls.

Why Does Agentic Safety Fail to Generalize Across Tasks?

cs.LG · 2026-05-07 · conditional · novelty 6.0

Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.

A-MEM: Agentic Memory for LLM Agents

cs.CL · 2025-02-17 · unverdicted · novelty 6.0

A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Covering Human Action Space for Computer Use: Data Synthesis and Benchmark cs.CV · 2026-05-12 · unverdicted · none · ref 38

    Presents CUActSpot benchmark and renderer-LLM data synthesis that lets a 4B model outperform larger open-source models on complex computer interactions.

  • Learning Agentic Policy from Action Guidance cs.CL · 2026-05-12 · unverdicted · none · ref 10

    ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

  • Why Does Agentic Safety Fail to Generalize Across Tasks? cs.LG · 2026-05-07 · conditional · none · ref 30

    Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.

  • A-MEM: Agentic Memory for LLM Agents cs.CL · 2025-02-17 · unverdicted · none · ref 7

    A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.

  • Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 49

    The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.