AVA-Bench evaluates vision foundation models by disentangling 14 atomic visual abilities with aligned training-test distributions to reveal precise ability fingerprints.
Prompt-cam: A simpler interpretable transformer for fine-grained analysis
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
FOCUS enables reliable spatial-spectral interpretability for frozen ViTs in hyperspectral imaging with class-specific prompts and a [SINK] token that reduces attention collapse.
LARE uses parallel encoding of full images and low-attention regions to improve text-image retrieval, shown on a new Dense-Set subset of COCO and Flickr30K with re-captioned overlooked areas.
citing papers explorer
-
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
AVA-Bench evaluates vision foundation models by disentangling 14 atomic visual abilities with aligned training-test distributions to reveal precise ability fingerprints.
-
FOCUS: Fused Observation of Channels for Unveiling Spectra
FOCUS enables reliable spatial-spectral interpretability for frozen ViTs in hyperspectral imaging with class-specific prompts and a [SINK] token that reduces attention collapse.
-
LARE: Low-Attention Region Encoding for Text-Image Retrieval
LARE uses parallel encoding of full images and low-attention regions to improve text-image retrieval, shown on a new Dense-Set subset of COCO and Flickr30K with re-captioned overlooked areas.