pith. sign in

hub

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models

26 Pith papers cite this work. Polarity classification is still indexing.

26 Pith papers citing it

hub tools

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

background 3

clear filters

representative citing papers

Latent Visual Reasoning

cs.CV · 2025-09-29 · unverdicted · novelty 7.0

Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.

VESTA: Visual Exploration with Statistical Tool Agents

cs.AI · 2026-05-29 · unverdicted · novelty 6.0

VESTA introduces dynamic tool creation for VLMs that outperforms static-tool and no-tool baselines on distribution fitting, time series, and astronomy tasks in the new DAWN benchmark.

Self-Prophetic Decoding to Unlock Visual Search in LVLMs

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

SeProD is a plug-and-play self-prophetic decoding framework that combines pre- and post-training LVLM capabilities via probability-based sampling to improve coherent visual search and multi-step reasoning.

Mull-Tokens: Modality-Agnostic Latent Thinking

cs.CV · 2025-12-11 · unverdicted · novelty 6.0

Mull-Tokens are modality-agnostic latent tokens that enable free-form multimodal thinking and deliver up to 16% gains on spatial reasoning benchmarks.

Grounded Reinforcement Learning for Visual Reasoning

cs.CV · 2025-05-29 · unverdicted · novelty 6.0

ViGoRL introduces visually grounded RL that anchors reasoning steps to image coordinates and uses multi-turn zooming to outperform standard RL and supervised baselines on spatial and GUI reasoning benchmarks.

MAG-3D: Multi-Agent Grounded Reasoning for 3D Understanding

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MAG-3D is a training-free multi-agent framework that coordinates planning, grounding, and coding agents with off-the-shelf VLMs to achieve grounded 3D reasoning and state-of-the-art benchmark results.

citing papers explorer

Showing 1 of 1 citing paper after filters.