pith. sign in

hub Canonical reference

ScreenAI: A Vision-Language Model for UI and Infographics Understanding , year =

Canonical reference. 100% of citing Pith papers cite this work as background.

10 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 6

citation-polarity summary

roles

background 6

polarities

background 6

clear filters

representative citing papers

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Open 4B and 8B visual web agents achieve state-of-the-art results on browser benchmarks by predicting actions from screenshots and instructions, outperforming similar open models and some closed larger-model agents, with full release of data and code planned.

A Pattern Language for Resilient Visual Agents

cs.AI · 2026-04-30 · unverdicted · novelty 4.0

Proposes four architectural patterns—Hybrid Affordance Integration, Adaptive Visual Anchoring, Visual Hierarchy Synthesis, and Semantic Scene Graph—to balance non-determinism and latency of foundation models with enterprise requirements for determinism and real-time performance.

citing papers explorer

Showing 8 of 8 citing papers after filters.