pith. sign in

hub Canonical reference

Mul- timodal web navigation with instruction-finetuned foundation models

Canonical reference. 80% of citing Pith papers cite this work as background.

12 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 dataset 1

citation-polarity summary

representative citing papers

Mitigating Coordinate Prediction Bias from Positional Encoding Failures

cs.CV · 2025-10-25 · unverdicted · novelty 6.0

VPSG corrects predictable directional coordinate biases in MLLMs by shuffling visual positional encodings to isolate unconditioned tendencies and steering digit decoding with a lightweight finite-state machine, yielding accuracy gains on ScreenSpot-Pro without retraining.

WebCanvas: Benchmarking Web Agents in Online Environments

cs.CL · 2024-06-18 · unverdicted · novelty 6.0

WebCanvas creates a dynamic benchmark for web agents with a noise-resistant evaluation metric, the Mind2Web-Live dataset of 542 tasks, and open-source tools and agent framework for ongoing online testing.

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

cs.HC · 2024-01-17 · unverdicted · novelty 6.0

SeeClick improves visual GUI agents via GUI grounding pre-training on automatically curated data and introduces the ScreenSpot benchmark, with results indicating that stronger grounding boosts downstream task performance.

AppAgent: Multimodal Agents as Smartphone Users

cs.CV · 2023-12-21 · unverdicted · novelty 5.0

AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.

citing papers explorer

Showing 12 of 12 citing papers.