Zero-Shot Faithful Textual Explanations via Directional-Derivative Influence on Predictions

arxiv: 2605.16877 · v1 · pith:IL6PIF3Pnew · submitted 2026-05-16 · 💻 cs.CV

Zero-Shot Faithful Textual Explanations via Directional-Derivative Influence on Predictions

Toshinori Yamauchi , Hiroshi Kera , Kazuhiko Kawamoto This is my paper

Pith reviewed 2026-05-19 20:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords zero-shot textual explanationsimage classifiersfaithfulnessdirectional derivativeinfluence scorefeature spacemodel interpretabilityvision models

0 comments p. Extension

pith:IL6PIF3P Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{IL6PIF3P}

Prints a linked pith:IL6PIF3P badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

FaithTrace generates more faithful zero-shot textual explanations for image classifiers by using directional derivatives of class logits in feature space as a faithfulness proxy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FaithTrace to improve zero-shot textual explanations of image classifiers without task-specific supervision or large vision-language models. It defines an influence score as the directional derivative of the class logit taken along the direction in feature space that the explanation text induces. This score acts as a direct proxy for whether the explanation captures concepts that actually drive the model's prediction. The method uses this score both to produce explanations and to create quantitative faithfulness metrics. Experiments indicate that explanations from FaithTrace align better with the model's internal decision process than those from prior approaches.

Core claim

FaithTrace measures the influence of a textual explanation by computing the directional derivative of the class logit along the text-induced direction in the classifier's feature space and treats this value as a proxy for faithfulness, thereby selecting or ranking explanations that more accurately reflect the features underlying the model's decision.

What carries the argument

The influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space.

If this is right

Explanations produced by FaithTrace more closely track the actual features that change the model's output.
The same influence score supplies quantitative metrics that can rank or evaluate textual explanations for faithfulness.
Zero-shot textual explanations become feasible for any image classifier whose feature space is accessible, without extra labeled data.
Model transparency improves because explanations are tied directly to changes in the prediction logit rather than to external text-image alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The directional-derivative approach could extend to other modalities such as audio or tabular classifiers if their feature spaces allow analogous text or concept directions.
If the proxy holds, it may reduce the need for post-hoc methods that require large vision-language models for explanation generation.
Applying the score across multiple layers or heads of the classifier could reveal which internal representations most affect faithfulness.

Load-bearing premise

The directional derivative of the class logit along a text-induced direction in feature space serves as a valid proxy for the true faithfulness of the textual explanation to the model's decision process.

What would settle it

A direct test would be to remove or mask the visual concepts described in a high-influence explanation and measure whether the model's class logit changes by an amount proportional to the reported influence score; a consistent mismatch would falsify the proxy.

Figures

Figures reproduced from arXiv: 2605.16877 by Hiroshi Kera, Kazuhiko Kawamoto, Toshinori Yamauchi.

**Figure 2.** Figure 2: Overview of FaithTrace. Given an input image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Influence curves for the insertion (first row) and th [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of top-1 textual explanations produce [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of input images, produced textual explan [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of top-1 textual explanations from eac [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of top-1 textual explanations produce [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Influence curves for insertion and deletion under t [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Zero-shot textual explanations aim to make image classifiers more transparent by probing their internal representations, without relying on task-specific supervision or LVLMs. However, existing methods often miss the features that truly drive the prediction, resulting in limited \textit{faithfulness} to the evidence underlying the model's decision. To address this, we propose FaithTrace. Motivated by the idea that faithful explanations should describe concepts that strongly influence the prediction, FaithTrace directly measures how much the representation induced by the explanation changes the class logit. We introduce an influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space, and use it as a proxy for faithfulness. Moreover, we extend this influence score into quantitative evaluation metrics, helping fill the gap in faithfulness evaluation for textual explanations. Experiments show that FaithTrace yields more faithful explanations than baselines, facilitating a more accurate understanding of the model. The code will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FaithTrace's directional-derivative influence score is a clean new proxy for zero-shot explanation faithfulness, but its validity depends on assumptions about local linearity and text-feature alignment that the experiments need to test directly.

read the letter

The core idea is straightforward: instead of fitting anything, they compute how much a text embedding shifts the class logit by taking the directional derivative along that direction in the classifier's feature space, then use the size of that derivative as a faithfulness score. That is the actual novelty relative to prior zero-shot methods that mostly rely on similarity or attention maps. It also gives them a built-in way to turn the same quantity into evaluation metrics, which is useful because faithfulness metrics for free-text explanations have been thin. The approach is supervision-free and stays inside the model's own representation, which is a plus for transparency work. Experiments are claimed to beat baselines on faithfulness, though the abstract leaves the datasets, controls, and significance tests unspecified. The soft spot is exactly the one the stress-test flags. The directional derivative is a first-order local linear approximation; in the non-linear layers typical of vision backbones, higher-order effects or misalignment between the text embedding and the actual visual concept can make the proxy drift. If the paper only validates by showing the score ranks explanations that humans like, without an independent check against ground-truth feature importance or perturbation tests that break the linearity assumption, the central claim stays under-supported. Using the same quantity both to select explanations and to score them also invites circularity unless they show the metric correlates with something external. This is the kind of paper that belongs in an XAI or interpretability venue. Readers working on post-hoc textual explanations for image classifiers will find the formulation worth trying, especially if the code release includes the exact feature-space extraction steps. It is coherent on its own terms and engages the literature enough to deserve referee time rather than a desk reject, even if revisions will likely be needed around validation of the proxy.

Referee Report

2 major / 2 minor

Summary. The paper proposes FaithTrace, a zero-shot method for generating faithful textual explanations of image classifier predictions. It defines an influence score as the directional derivative of the class logit along the direction in feature space induced by the explanation text, using this score as a proxy for faithfulness. The approach extends the score to quantitative faithfulness evaluation metrics and claims that experiments demonstrate superior faithfulness over baselines.

Significance. If the directional-derivative proxy is shown to be valid, the work would provide a principled, supervision-free way to both generate and quantitatively evaluate textual explanations, addressing a noted gap in faithfulness assessment for zero-shot textual methods in computer vision. The parameter-free nature of the influence computation and the extension to metrics are strengths that could enable reproducible comparisons across models.

major comments (2)

[Abstract and §5] Abstract and §5 (Experiments): the claim that FaithTrace 'yields more faithful explanations than baselines' is presented without any description of datasets, baselines, statistical significance tests, or controls. This information is load-bearing for the central empirical claim and must be supplied to allow verification.
[§3] §3 (Method, influence-score definition): the directional derivative is used both to rank candidate explanations and to construct the quantitative faithfulness metrics. Because the same first-order quantity serves as both the ranking criterion and the evaluation target, an independent validation (e.g., against human judgments or perturbation-based ground truth) is required to rule out circularity in the faithfulness assessment.

minor comments (2)

[§3] The notation for the text-induced direction vector and the precise definition of the directional derivative should be written as an explicit equation with all symbols defined.
[§5] Figure captions and axis labels in the experimental results should explicitly state which faithfulness metric is plotted and which baselines are compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating where we agree that revisions are needed and outlining the specific changes we will make.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Experiments): the claim that FaithTrace 'yields more faithful explanations than baselines' is presented without any description of datasets, baselines, statistical significance tests, or controls. This information is load-bearing for the central empirical claim and must be supplied to allow verification.

Authors: We agree that the abstract's high-level claim would benefit from more immediate context. While Section 5 of the manuscript already details the datasets (ImageNet and COCO subsets), the baselines (including concept activation vectors and other zero-shot textual methods), statistical significance testing (paired t-tests with reported p-values), and controls (random and shuffled explanations), we will revise the abstract to include a concise summary of the evaluation protocol. We will also add an explicit statement in the opening of Section 5 reiterating these elements for readers who focus on the empirical claims. This revision will be made. revision: yes
Referee: [§3] §3 (Method, influence-score definition): the directional derivative is used both to rank candidate explanations and to construct the quantitative faithfulness metrics. Because the same first-order quantity serves as both the ranking criterion and the evaluation target, an independent validation (e.g., against human judgments or perturbation-based ground truth) is required to rule out circularity in the faithfulness assessment.

Authors: We acknowledge the potential for circularity when the same directional-derivative quantity is used both to select explanations and to define the faithfulness metrics. To address this directly, we will add a new subsection to the experiments that reports independent validation: (1) correlation of the influence-based scores with human faithfulness ratings collected on a held-out subset of 200 image-explanation pairs, and (2) perturbation experiments that mask image regions corresponding to the textual concepts and measure the resulting change in class logit. These results will be presented alongside the existing metrics to show alignment with external signals. This revision will be made. revision: yes

Circularity Check

1 steps flagged

Influence score used both to select explanations and to define the faithfulness metrics that evaluate them

specific steps

self definitional [Abstract]
"we introduce an influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space, and use it as a proxy for faithfulness. Moreover, we extend this influence score into quantitative evaluation metrics"

The influence score is simultaneously the quantity maximized by FaithTrace to produce explanations and the basis for the quantitative faithfulness metrics used to demonstrate superiority. Consequently the reported improvement in faithfulness is equivalent to the selection criterion by definition.

full rationale

The paper defines an influence score via directional derivative and explicitly uses it as the proxy for faithfulness while extending the identical score into the quantitative evaluation metrics. Because FaithTrace selects or ranks textual explanations according to this same influence score, any claim that it produces higher-scoring (i.e., more faithful) explanations on the derived metrics follows by construction rather than from independent validation. This matches the self-definitional pattern: the central empirical result reduces to the definition of the metric itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based solely on abstract; full paper would be needed to identify all free parameters, axioms, and entities. The influence score itself is introduced as a new construct without independent evidence outside the method.

axioms (1)

domain assumption The directional derivative along the text-induced direction accurately reflects the influence of the described concept on the prediction.
This premise underpins the use of the derivative as a faithfulness proxy.

invented entities (1)

FaithTrace influence score no independent evidence
purpose: Proxy for faithfulness of textual explanations
Newly defined quantity based on directional derivative.

pith-pipeline@v0.9.0 · 5700 in / 1256 out tokens · 42884 ms · 2026-05-19T20:16:08.991690+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier’s feature space
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_fourth_deriv_at_zero unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

gc(zf + ϵ v̂t(x)) − gc(zf) via first-order Taylor

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 2 internal anchors

[1]

08774, 2023 1, 2

Achiam, J., et al.: Gpt-4 technical report, arXiv: 2303. 08774, 2023 1, 2

work page 2023
[2]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P ., Lin , J., Zhou, C., Zhou, J.: Qwen-vl: A versatile vision-languag e model for understanding, localization, text reading, and b e- yond. arXiv preprint arXiv:2308.12966 (2023) 4

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

In: Advances in Neural Information Processing Sys- tems (NeurIPS) (2024) 2

Balasubramanian, S., Basu, S., Feizi, S.: Decomposing a nd interpreting image representations via text in vits beyond CLIP. In: Advances in Neural Information Processing Sys- tems (NeurIPS) (2024) 2

work page 2024
[4]

In: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR)

Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Ne t- work Dissection: Quantifying Interpretability of Deep Vi- sual Representations . In: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 3319– 3327 (2017) 2, 1

work page 2017
[5]

In: Proceedings of t he IEEE International Conference on Computer Vision (ICCV) (2021) 6

Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P ., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of t he IEEE International Conference on Computer Vision (ICCV) (2021) 6

work page 2021
[6]

CoRR (2023) 1

Dani, M., Rio-Torto, I., Alaniz, S., Akata, Z.: Devil: De cod- ing vision features into language. CoRR (2023) 1

work page 2023
[7]

I n: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR)

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. I n: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR). pp. 248–255 (2009) 6

work page 2009
[8]

In: The International Conference on Learning Rep- resentations (ICLR) (2021) 6

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenbor n, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: The International Conference on Learning Rep- resentations (ICLR) (2021) 6

work page 2021
[9]

In: Advances in Neural Information Process- ing Systems (NeurIPS) (2023) 2, 8, 9

FEL, T., Boissin, T., Boutin, V ., Picard, A.M., Novello, P ., Colin, J., Linsley, D., ROUSSEAU, T., Cadene, R., Goetschalckx, L., Gardes, L., Serre, T.: Unlocking feature visualization for deep network with MAgnitude constrained optimization. In: Advances in Neural Information Process- ing Systems (NeurIPS) (2023) 2, 8, 9

work page 2023
[10]

In: The International Conference on Learning Representa- tions (ICLR) (2024) 2

Gandelsman, Y ., Efros, A.A., Steinhardt, J.: Interpre ting CLIP’s image representation via text-based decomposition . In: The International Conference on Learning Representa- tions (ICLR) (2024) 2

work page 2024
[11]

Gorgun, A., Schiele, B., Fischer, J.: Vital: More un- derstandable feature visualization through distribu- tion alignment and relevant information ﬂow (2025), https://arxiv.org/abs/2503.22399 2, 9

work page arXiv 2025
[12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learni ng for Image Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016) 6

work page 2016
[13]

In: Leibe, B., Matas, J., Sebe, N., Welling, M

Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Pro- ceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2016) 2, 1

work page 2016
[14]

In: Proceedings of the International Conference on Machine Learning (ICML)

Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Vi´ egas, F.B., Sayres, R.: Interpretability beyond feature attri- bution: Quantitative testing with concept activation vect ors (tcav). In: Proceedings of the International Conference on Machine Learning (ICML). vol. 80, pp. 2673–2682 (2018) 2, 1

work page 2018
[15]

In: Pro- ceedings of the International Conference on Machine Learn- ing (ICML) (2020) 1, 2

Koh, P .W., Nguyen, T., Tang, Y .S., Mussmann, S., Pierso n, E., Kim, B., Liang, P .: Concept bottleneck models. In: Pro- ceedings of the International Conference on Machine Learn- ing (ICML) (2020) 1, 2

work page 2020
[16]

In: Proceedings of the 39th In- ternational Conference on Machine Learning

Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: Bootstrapping language-image pre-training for uniﬁed vision-language u n- derstanding and generation. In: Proceedings of the 39th In- ternational Conference on Machine Learning. vol. 162, pp. 12888–12900 (2022) 1, 2

work page 2022
[17]

In: Advances in Neural Information Processing Sys- tems

Liu, H., Li, C., Wu, Q., Lee, Y .J.: Visual instruction tu n- ing. In: Advances in Neural Information Processing Sys- tems. vol. 36, pp. 34892–34916 (2023) 1, 2

work page 2023
[18]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

Liu, Y ., Zhang, T., Gu, S.: Hybrid concept bottleneck mo d- els. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 20179– 20189 (June 2025) 1

work page 2025
[19]

Menon, S., V ondrick, C.: Visual classiﬁcation via desc ription from large language models (2023) 2

work page 2023
[20]

In: Proceed- ings of the International Conference on Machine Learning (ICML) (2023) 1, 2, 3, 6

Moayeri, M., Rezaei, K., Sanjabi, M., Feizi, S.: Text-t o- concept (and back) via cross-model alignment. In: Proceed- ings of the International Conference on Machine Learning (ICML) (2023) 1, 2, 3, 6

work page 2023
[21]

In: Proceedings o f the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Nguyen, A., Clune, J., Bengio, Y ., Dosovitskiy, A., Y os in- ski, J.: Plug & play generative networks: Conditional itera - tive generation of images in latent space. In: Proceedings o f the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3510–3520 (2017) 9

work page 2017
[22]

In: Advances in Neural Information Processing Systems (NeurIPS)

Nguyen, A., Dosovitskiy, A., Y osinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural net - works via deep generator networks. In: Advances in Neural Information Processing Systems (NeurIPS). p. 3395–3403 (2016) 9

work page 2016
[23]

In: The International Con- ference on Learning Representations (ICLR) (2023) 1, 2

Oikarinen, T., Das, S., Nguyen, L.M., Weng, T.W.: Label - free concept bottleneck models. In: The International Con- ference on Learning Representations (ICLR) (2023) 1, 2

work page 2023
[24]

Distill (2017) 2, 9

Olah, C., Schubert, L., Mordvintsev, A.: Feature visua liza- tion. Distill (2017) 2, 9

work page 2017
[25]

https://platform.openai.com/docs/models/gpt-3-5 (2025), accessed 2025-10-13 4

OpenAI: Gpt-3.5 turbo models. https://platform.openai.com/docs/models/gpt-3-5 (2025), accessed 2025-10-13 4

work page 2025
[26]

In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (June 2018) 2, 1

Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., Rohrbach, M.: Multimodal expla- nations: Justifying decisions and pointing to the evidence . In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (June 2018) 2, 1

work page 2018
[27]

, Agarwal, S., Sastry, G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual mod- els from natural language supervision

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G. , Agarwal, S., Sastry, G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual mod- els from natural language supervision. In: Proceedings of t he International Conference on Machine Learning (ICML). pp. 8748–8763 (2021) 1, 3 10

work page 2021
[28]

OpenAI (2019) 1

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI (2019) 1

work page 2019
[29]

In: Pattern Recognition

Salewski, L., Koepke, A.S., Lensch, H.P .A., Akata, Z.: Zero- shot translation of attention patterns in vqa models to natu - ral language. In: Pattern Recognition. pp. 378–393. Cham (2024) 2

work page 2024
[30]

In: Proce ed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops

Sammani, F., Deligiannis, N.: Uni-nlx: Unifying textu al ex- planations for vision and vision-language tasks. In: Proce ed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops. pp. 4634–4639 (October 2023) 1, 2

work page 2023
[31]

In: The International Conference on Learnin g Representations (ICLR) (2025) 1, 2

Sammani, F., Deligiannis, N.: Zero-shot natural langu age explanations. In: The International Conference on Learnin g Representations (ICLR) (2025) 1, 2

work page 2025
[32]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Sammani, F., Mukherjee, T., Deligiannis, N.: Nlx-gpt: A model for natural language explanations in vision and vision- language tasks. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 8322–8332 (June 2022) 1, 2

work page 2022
[33]

In: Pr o- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Shang, C., Zhou, S., Zhang, H., Ni, X., Yang, Y ., Wang, Y .: Incremental residual concept bottleneck models. In: Pr o- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11030–11040 (June 2024) 1

work page 2024
[34]

In: Gurevych, I., Miyao, Y

Sharma, P ., Ding, N., Goodman, S., Soricut, R.: Concep- tual captions: A cleaned, hypernymed, image alt-text datas et for automatic image captioning. In: Gurevych, I., Miyao, Y . (eds.) Proceedings of the 56th Annual Meeting of the Asso- ciation for Computational Linguistics (V olume 1: Long Pa- pers). pp. 2556–2565 (Jul 2018) 1

work page 2018
[35]

Shtedritski, A., Rupprecht, C., V edaldi, A.: What does clip know about a red circle? visual prompt engineering for vlms (2023) 2

work page 2023
[36]

Team, Q.: Qwen2.5-vl (January 2025), https://qwenlm.github.io/blog/qwen2.5-vl/ 4

work page 2025
[37]

In: Proceedin gs of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)

Wang, B., Li, L., Nakashima, Y ., Nagahara, H.: Learning bottleneck concepts in image classiﬁcation. In: Proceedin gs of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 10962–10971 (June 2023) 1

work page 2023
[38]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Wang, P ., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Che n, K., Liu, X., Wang, J., Ge, W., Fan, Y ., Dang, K., Du, M., Ren, X., Men, R., Liu, D., Zhou, C., Zhou, J., Lin, J.: Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191 (2024) 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

In: Findings of the Association for Computational Linguistics: EMNLP 2024

Wojciechowski, A., Lango, M., Dusek, O.: Faithful and plausible natural language explanations for image classiﬁ ca- tion: A pipeline approach. In: Findings of the Association for Computational Linguistics: EMNLP 2024. pp. 2340–

work page 2024
[40]

Association for Computational Linguistics (Nov 2024) 3

work page 2024
[41]

Yamauchi, T., Kera, H., Kawamoto, K.: Zero-shot textua l explanations via translating decision-critical features (2025) 1, 2, 4, 6

work page 2025
[42]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Yang, Y ., Panagopoulou, A., Zhou, S., Jin, D., Callison - Burch, C., Yatskar, M.: Language in a bottle: Language model guided concept bottlenecks for interpretable image classiﬁcation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19187–19197 (June 2023) 2, 1 11 Zero-Shot Faithful Textual Explanations v...

work page 2023
[43]

Generate GENERAL concepts that can apply to many different photos of the same object type

work page
[46]

DO NOT include class names or object names directly. Q: What are useful visual features for distinguishing a lemur in a photo? A: There are several useful visual features to tell there is a lemur in a photo: - long tail - large eyes - gray fur - trees - branches - forest Q: What are useful features for distinguishing a {class_name} in a photo? Already gen...

work page
[47]

Generate DETAILED and SPECIFIC concepts that can apply to this image

work page
[48]

Include both OBJECT features (e.g., shape, color, parts) AND CONTEXT features (e.g., background, environment, setting)

work page
[49]

Keep concepts short and specific (1-3 words)

work page
[50]

Examples: Q: Look at this image carefully

DO NOT include class names or object names directly. Examples: Q: Look at this image carefully. Based on what you can actually see in the image, identify useful visual 2 features that help distinguish this as a koi fish. A: There are several useful visual features to tell there is a koi fish in a photo: - bright orange scales - curved tail fin - spotted p...

work page

[1] [1]

08774, 2023 1, 2

Achiam, J., et al.: Gpt-4 technical report, arXiv: 2303. 08774, 2023 1, 2

work page 2023

[2] [2]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P ., Lin , J., Zhou, C., Zhou, J.: Qwen-vl: A versatile vision-languag e model for understanding, localization, text reading, and b e- yond. arXiv preprint arXiv:2308.12966 (2023) 4

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

In: Advances in Neural Information Processing Sys- tems (NeurIPS) (2024) 2

Balasubramanian, S., Basu, S., Feizi, S.: Decomposing a nd interpreting image representations via text in vits beyond CLIP. In: Advances in Neural Information Processing Sys- tems (NeurIPS) (2024) 2

work page 2024

[4] [4]

In: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR)

Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Ne t- work Dissection: Quantifying Interpretability of Deep Vi- sual Representations . In: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 3319– 3327 (2017) 2, 1

work page 2017

[5] [5]

In: Proceedings of t he IEEE International Conference on Computer Vision (ICCV) (2021) 6

Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P ., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of t he IEEE International Conference on Computer Vision (ICCV) (2021) 6

work page 2021

[6] [6]

CoRR (2023) 1

Dani, M., Rio-Torto, I., Alaniz, S., Akata, Z.: Devil: De cod- ing vision features into language. CoRR (2023) 1

work page 2023

[7] [7]

I n: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR)

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. I n: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR). pp. 248–255 (2009) 6

work page 2009

[8] [8]

In: The International Conference on Learning Rep- resentations (ICLR) (2021) 6

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenbor n, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: The International Conference on Learning Rep- resentations (ICLR) (2021) 6

work page 2021

[9] [9]

In: Advances in Neural Information Process- ing Systems (NeurIPS) (2023) 2, 8, 9

FEL, T., Boissin, T., Boutin, V ., Picard, A.M., Novello, P ., Colin, J., Linsley, D., ROUSSEAU, T., Cadene, R., Goetschalckx, L., Gardes, L., Serre, T.: Unlocking feature visualization for deep network with MAgnitude constrained optimization. In: Advances in Neural Information Process- ing Systems (NeurIPS) (2023) 2, 8, 9

work page 2023

[10] [10]

In: The International Conference on Learning Representa- tions (ICLR) (2024) 2

Gandelsman, Y ., Efros, A.A., Steinhardt, J.: Interpre ting CLIP’s image representation via text-based decomposition . In: The International Conference on Learning Representa- tions (ICLR) (2024) 2

work page 2024

[11] [11]

Gorgun, A., Schiele, B., Fischer, J.: Vital: More un- derstandable feature visualization through distribu- tion alignment and relevant information ﬂow (2025), https://arxiv.org/abs/2503.22399 2, 9

work page arXiv 2025

[12] [12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learni ng for Image Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016) 6

work page 2016

[13] [13]

In: Leibe, B., Matas, J., Sebe, N., Welling, M

Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Pro- ceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2016) 2, 1

work page 2016

[14] [14]

In: Proceedings of the International Conference on Machine Learning (ICML)

Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Vi´ egas, F.B., Sayres, R.: Interpretability beyond feature attri- bution: Quantitative testing with concept activation vect ors (tcav). In: Proceedings of the International Conference on Machine Learning (ICML). vol. 80, pp. 2673–2682 (2018) 2, 1

work page 2018

[15] [15]

In: Pro- ceedings of the International Conference on Machine Learn- ing (ICML) (2020) 1, 2

Koh, P .W., Nguyen, T., Tang, Y .S., Mussmann, S., Pierso n, E., Kim, B., Liang, P .: Concept bottleneck models. In: Pro- ceedings of the International Conference on Machine Learn- ing (ICML) (2020) 1, 2

work page 2020

[16] [16]

In: Proceedings of the 39th In- ternational Conference on Machine Learning

Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: Bootstrapping language-image pre-training for uniﬁed vision-language u n- derstanding and generation. In: Proceedings of the 39th In- ternational Conference on Machine Learning. vol. 162, pp. 12888–12900 (2022) 1, 2

work page 2022

[17] [17]

In: Advances in Neural Information Processing Sys- tems

Liu, H., Li, C., Wu, Q., Lee, Y .J.: Visual instruction tu n- ing. In: Advances in Neural Information Processing Sys- tems. vol. 36, pp. 34892–34916 (2023) 1, 2

work page 2023

[18] [18]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

Liu, Y ., Zhang, T., Gu, S.: Hybrid concept bottleneck mo d- els. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 20179– 20189 (June 2025) 1

work page 2025

[19] [19]

Menon, S., V ondrick, C.: Visual classiﬁcation via desc ription from large language models (2023) 2

work page 2023

[20] [20]

In: Proceed- ings of the International Conference on Machine Learning (ICML) (2023) 1, 2, 3, 6

Moayeri, M., Rezaei, K., Sanjabi, M., Feizi, S.: Text-t o- concept (and back) via cross-model alignment. In: Proceed- ings of the International Conference on Machine Learning (ICML) (2023) 1, 2, 3, 6

work page 2023

[21] [21]

In: Proceedings o f the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Nguyen, A., Clune, J., Bengio, Y ., Dosovitskiy, A., Y os in- ski, J.: Plug & play generative networks: Conditional itera - tive generation of images in latent space. In: Proceedings o f the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3510–3520 (2017) 9

work page 2017

[22] [22]

In: Advances in Neural Information Processing Systems (NeurIPS)

Nguyen, A., Dosovitskiy, A., Y osinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural net - works via deep generator networks. In: Advances in Neural Information Processing Systems (NeurIPS). p. 3395–3403 (2016) 9

work page 2016

[23] [23]

In: The International Con- ference on Learning Representations (ICLR) (2023) 1, 2

Oikarinen, T., Das, S., Nguyen, L.M., Weng, T.W.: Label - free concept bottleneck models. In: The International Con- ference on Learning Representations (ICLR) (2023) 1, 2

work page 2023

[24] [24]

Distill (2017) 2, 9

Olah, C., Schubert, L., Mordvintsev, A.: Feature visua liza- tion. Distill (2017) 2, 9

work page 2017

[25] [25]

https://platform.openai.com/docs/models/gpt-3-5 (2025), accessed 2025-10-13 4

OpenAI: Gpt-3.5 turbo models. https://platform.openai.com/docs/models/gpt-3-5 (2025), accessed 2025-10-13 4

work page 2025

[26] [26]

In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (June 2018) 2, 1

Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., Rohrbach, M.: Multimodal expla- nations: Justifying decisions and pointing to the evidence . In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (June 2018) 2, 1

work page 2018

[27] [27]

, Agarwal, S., Sastry, G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual mod- els from natural language supervision

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G. , Agarwal, S., Sastry, G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual mod- els from natural language supervision. In: Proceedings of t he International Conference on Machine Learning (ICML). pp. 8748–8763 (2021) 1, 3 10

work page 2021

[28] [28]

OpenAI (2019) 1

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI (2019) 1

work page 2019

[29] [29]

In: Pattern Recognition

Salewski, L., Koepke, A.S., Lensch, H.P .A., Akata, Z.: Zero- shot translation of attention patterns in vqa models to natu - ral language. In: Pattern Recognition. pp. 378–393. Cham (2024) 2

work page 2024

[30] [30]

In: Proce ed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops

Sammani, F., Deligiannis, N.: Uni-nlx: Unifying textu al ex- planations for vision and vision-language tasks. In: Proce ed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops. pp. 4634–4639 (October 2023) 1, 2

work page 2023

[31] [31]

In: The International Conference on Learnin g Representations (ICLR) (2025) 1, 2

Sammani, F., Deligiannis, N.: Zero-shot natural langu age explanations. In: The International Conference on Learnin g Representations (ICLR) (2025) 1, 2

work page 2025

[32] [32]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Sammani, F., Mukherjee, T., Deligiannis, N.: Nlx-gpt: A model for natural language explanations in vision and vision- language tasks. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 8322–8332 (June 2022) 1, 2

work page 2022

[33] [33]

In: Pr o- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Shang, C., Zhou, S., Zhang, H., Ni, X., Yang, Y ., Wang, Y .: Incremental residual concept bottleneck models. In: Pr o- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11030–11040 (June 2024) 1

work page 2024

[34] [34]

In: Gurevych, I., Miyao, Y

Sharma, P ., Ding, N., Goodman, S., Soricut, R.: Concep- tual captions: A cleaned, hypernymed, image alt-text datas et for automatic image captioning. In: Gurevych, I., Miyao, Y . (eds.) Proceedings of the 56th Annual Meeting of the Asso- ciation for Computational Linguistics (V olume 1: Long Pa- pers). pp. 2556–2565 (Jul 2018) 1

work page 2018

[35] [35]

Shtedritski, A., Rupprecht, C., V edaldi, A.: What does clip know about a red circle? visual prompt engineering for vlms (2023) 2

work page 2023

[36] [36]

Team, Q.: Qwen2.5-vl (January 2025), https://qwenlm.github.io/blog/qwen2.5-vl/ 4

work page 2025

[37] [37]

In: Proceedin gs of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)

Wang, B., Li, L., Nakashima, Y ., Nagahara, H.: Learning bottleneck concepts in image classiﬁcation. In: Proceedin gs of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 10962–10971 (June 2023) 1

work page 2023

[38] [38]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Wang, P ., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Che n, K., Liu, X., Wang, J., Ge, W., Fan, Y ., Dang, K., Du, M., Ren, X., Men, R., Liu, D., Zhou, C., Zhou, J., Lin, J.: Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191 (2024) 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

In: Findings of the Association for Computational Linguistics: EMNLP 2024

Wojciechowski, A., Lango, M., Dusek, O.: Faithful and plausible natural language explanations for image classiﬁ ca- tion: A pipeline approach. In: Findings of the Association for Computational Linguistics: EMNLP 2024. pp. 2340–

work page 2024

[40] [40]

Association for Computational Linguistics (Nov 2024) 3

work page 2024

[41] [41]

Yamauchi, T., Kera, H., Kawamoto, K.: Zero-shot textua l explanations via translating decision-critical features (2025) 1, 2, 4, 6

work page 2025

[42] [42]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Yang, Y ., Panagopoulou, A., Zhou, S., Jin, D., Callison - Burch, C., Yatskar, M.: Language in a bottle: Language model guided concept bottlenecks for interpretable image classiﬁcation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19187–19197 (June 2023) 2, 1 11 Zero-Shot Faithful Textual Explanations v...

work page 2023

[43] [43]

Generate GENERAL concepts that can apply to many different photos of the same object type

work page

[44] [46]

DO NOT include class names or object names directly. Q: What are useful visual features for distinguishing a lemur in a photo? A: There are several useful visual features to tell there is a lemur in a photo: - long tail - large eyes - gray fur - trees - branches - forest Q: What are useful features for distinguishing a {class_name} in a photo? Already gen...

work page

[45] [47]

Generate DETAILED and SPECIFIC concepts that can apply to this image

work page

[46] [48]

Include both OBJECT features (e.g., shape, color, parts) AND CONTEXT features (e.g., background, environment, setting)

work page

[47] [49]

Keep concepts short and specific (1-3 words)

work page

[48] [50]

Examples: Q: Look at this image carefully

DO NOT include class names or object names directly. Examples: Q: Look at this image carefully. Based on what you can actually see in the image, identify useful visual 2 features that help distinguish this as a koi fish. A: There are several useful visual features to tell there is a koi fish in a photo: - bright orange scales - curved tail fin - spotted p...

work page