Title resolution pending

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models , author= · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.

PRISM: Perception Reasoning Interleaved for Sequential Decision Making

cs.AI · 2026-05-06 · unverdicted · novelty 5.0

PRISM interleaves VLM perception and LLM reasoning via a dynamic goal-oriented question-answer pipeline to produce sharper scene descriptions, outperforming prior image-based models on ALFWorld and Room-to-Room.

Agent AI: Surveying the Horizons of Multimodal Interaction

cs.AI · 2024-01-07 · unverdicted · novelty 4.0

The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

citing papers explorer

Showing 4 of 4 citing papers.

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs cs.CV · 2026-05-08 · unverdicted · none · ref 36
Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.
Prefix-Adaptive Block Diffusion for Efficient Document Recognition cs.CV · 2026-05-16 · unverdicted · none · ref 43
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
PRISM: Perception Reasoning Interleaved for Sequential Decision Making cs.AI · 2026-05-06 · unverdicted · none · ref 24
PRISM interleaves VLM perception and LLM reasoning via a dynamic goal-oriented question-answer pipeline to produce sharper scene descriptions, outperforming prior image-based models on ALFWorld and Room-to-Room.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 74
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer