Prompt-cam: A simpler interpretable transformer for fine-grained analysis

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis , author= · 2025 · arXiv 2501.09333

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

cs.CV · 2025-06-10 · unverdicted · novelty 7.0

AVA-Bench evaluates vision foundation models by disentangling 14 atomic visual abilities with aligned training-test distributions to reveal precise ability fingerprints.

FOCUS: Fused Observation of Channels for Unveiling Spectra

cs.CV · 2025-07-20 · unverdicted · novelty 6.0

FOCUS enables reliable spatial-spectral interpretability for frozen ViTs in hyperspectral imaging with class-specific prompts and a [SINK] token that reduces attention collapse.

LARE: Low-Attention Region Encoding for Text-Image Retrieval

cs.CV · 2026-06-17 · unverdicted · novelty 5.0

LARE uses parallel encoding of full images and low-attention regions to improve text-image retrieval, shown on a new Dense-Set subset of COCO and Flickr30K with re-captioned overlooked areas.

citing papers explorer

Showing 3 of 3 citing papers.

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models cs.CV · 2025-06-10 · unverdicted · none · ref 14
AVA-Bench evaluates vision foundation models by disentangling 14 atomic visual abilities with aligned training-test distributions to reveal precise ability fingerprints.
FOCUS: Fused Observation of Channels for Unveiling Spectra cs.CV · 2025-07-20 · unverdicted · none · ref 8
FOCUS enables reliable spatial-spectral interpretability for frozen ViTs in hyperspectral imaging with class-specific prompts and a [SINK] token that reduces attention collapse.
LARE: Low-Attention Region Encoding for Text-Image Retrieval cs.CV · 2026-06-17 · unverdicted · none · ref 60
LARE uses parallel encoding of full images and low-attention regions to improve text-image retrieval, shown on a new Dense-Set subset of COCO and Flickr30K with re-captioned overlooked areas.

Prompt-cam: A simpler interpretable transformer for fine-grained analysis

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer