VIABLE benchmark reveals existing VLM judges are unreliable for VIA tasks (GPT-5.4 at 52.6% diagnostic accuracy with 94.2% self-preference) and proposes VIA-Judge-Agent for improvements.
Less is More
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunistic delegation.
citing papers explorer
-
A Visually Impaired Assistance Benchmark for VLM-as-a-Judge Evaluation
VIABLE benchmark reveals existing VLM judges are unreliable for VIA tasks (GPT-5.4 at 52.6% diagnostic accuracy with 94.2% self-preference) and proposes VIA-Judge-Agent for improvements.
-
VisionClaw: Always-On AI Agents through Smart Glasses
VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunistic delegation.