pith. sign in

Canonical reference

Visualagentbench: Towards large multimodal models as visual foundation agents

Canonical reference. 86% of citing Pith papers cite this work as background.

11 Pith papers citing it
Background 86% of classified citations

citation-role summary

background 5 dataset 1 method 1

citation-polarity summary

years

2026 7 2025 4

clear filters

representative citing papers

Advancing Creative Physical Intelligence in Large Multimodal Models

cs.AI · 2026-05-25 · unverdicted · novelty 7.0

Introduces MM-CreativityBench for affordance-grounded creative tool use and shows that DPO-based alignment with an affordance knowledge base improves entity and part selection while cutting hallucination errors in LMMs.

QoS-QoE Translation with Large Language Model

cs.MM · 2026-04-09 · unverdicted · novelty 6.0

A new QoS-QoE Translation dataset is constructed from multimedia literature and fine-tuned LLMs demonstrate strong performance on bidirectional continuous and discrete QoS-QoE predictions.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

citing papers explorer

Showing 5 of 5 citing papers after filters.