pith. sign in

arxiv: 2605.16877 · v1 · pith:IL6PIF3Pnew · submitted 2026-05-16 · 💻 cs.CV

Zero-Shot Faithful Textual Explanations via Directional-Derivative Influence on Predictions

Pith reviewed 2026-05-19 20:16 UTC · model grok-4.3

classification 💻 cs.CV
keywords zero-shot textual explanationsimage classifiersfaithfulnessdirectional derivativeinfluence scorefeature spacemodel interpretabilityvision models
0
0 comments X p. Extension
pith:IL6PIF3P Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{IL6PIF3P}

Prints a linked pith:IL6PIF3P badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

FaithTrace generates more faithful zero-shot textual explanations for image classifiers by using directional derivatives of class logits in feature space as a faithfulness proxy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FaithTrace to improve zero-shot textual explanations of image classifiers without task-specific supervision or large vision-language models. It defines an influence score as the directional derivative of the class logit taken along the direction in feature space that the explanation text induces. This score acts as a direct proxy for whether the explanation captures concepts that actually drive the model's prediction. The method uses this score both to produce explanations and to create quantitative faithfulness metrics. Experiments indicate that explanations from FaithTrace align better with the model's internal decision process than those from prior approaches.

Core claim

FaithTrace measures the influence of a textual explanation by computing the directional derivative of the class logit along the text-induced direction in the classifier's feature space and treats this value as a proxy for faithfulness, thereby selecting or ranking explanations that more accurately reflect the features underlying the model's decision.

What carries the argument

The influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space.

If this is right

  • Explanations produced by FaithTrace more closely track the actual features that change the model's output.
  • The same influence score supplies quantitative metrics that can rank or evaluate textual explanations for faithfulness.
  • Zero-shot textual explanations become feasible for any image classifier whose feature space is accessible, without extra labeled data.
  • Model transparency improves because explanations are tied directly to changes in the prediction logit rather than to external text-image alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The directional-derivative approach could extend to other modalities such as audio or tabular classifiers if their feature spaces allow analogous text or concept directions.
  • If the proxy holds, it may reduce the need for post-hoc methods that require large vision-language models for explanation generation.
  • Applying the score across multiple layers or heads of the classifier could reveal which internal representations most affect faithfulness.

Load-bearing premise

The directional derivative of the class logit along a text-induced direction in feature space serves as a valid proxy for the true faithfulness of the textual explanation to the model's decision process.

What would settle it

A direct test would be to remove or mask the visual concepts described in a high-influence explanation and measure whether the model's class logit changes by an amount proportional to the reported influence score; a consistent mismatch would falsify the proxy.

Figures

Figures reproduced from arXiv: 2605.16877 by Hiroshi Kera, Kazuhiko Kawamoto, Toshinori Yamauchi.

Figure 1
Figure 1. Figure 1: Comparison of explanations produced by Text-To-C [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of FaithTrace. Given an input image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Influence curves for the insertion (first row) and th [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of top-1 textual explanations produce [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of input images, produced textual explan [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of top-1 textual explanations from eac [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of top-1 textual explanations produce [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Influence curves for insertion and deletion under t [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

Zero-shot textual explanations aim to make image classifiers more transparent by probing their internal representations, without relying on task-specific supervision or LVLMs. However, existing methods often miss the features that truly drive the prediction, resulting in limited \textit{faithfulness} to the evidence underlying the model's decision. To address this, we propose FaithTrace. Motivated by the idea that faithful explanations should describe concepts that strongly influence the prediction, FaithTrace directly measures how much the representation induced by the explanation changes the class logit. We introduce an influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space, and use it as a proxy for faithfulness. Moreover, we extend this influence score into quantitative evaluation metrics, helping fill the gap in faithfulness evaluation for textual explanations. Experiments show that FaithTrace yields more faithful explanations than baselines, facilitating a more accurate understanding of the model. The code will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes FaithTrace, a zero-shot method for generating faithful textual explanations of image classifier predictions. It defines an influence score as the directional derivative of the class logit along the direction in feature space induced by the explanation text, using this score as a proxy for faithfulness. The approach extends the score to quantitative faithfulness evaluation metrics and claims that experiments demonstrate superior faithfulness over baselines.

Significance. If the directional-derivative proxy is shown to be valid, the work would provide a principled, supervision-free way to both generate and quantitatively evaluate textual explanations, addressing a noted gap in faithfulness assessment for zero-shot textual methods in computer vision. The parameter-free nature of the influence computation and the extension to metrics are strengths that could enable reproducible comparisons across models.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Experiments): the claim that FaithTrace 'yields more faithful explanations than baselines' is presented without any description of datasets, baselines, statistical significance tests, or controls. This information is load-bearing for the central empirical claim and must be supplied to allow verification.
  2. [§3] §3 (Method, influence-score definition): the directional derivative is used both to rank candidate explanations and to construct the quantitative faithfulness metrics. Because the same first-order quantity serves as both the ranking criterion and the evaluation target, an independent validation (e.g., against human judgments or perturbation-based ground truth) is required to rule out circularity in the faithfulness assessment.
minor comments (2)
  1. [§3] The notation for the text-induced direction vector and the precise definition of the directional derivative should be written as an explicit equation with all symbols defined.
  2. [§5] Figure captions and axis labels in the experimental results should explicitly state which faithfulness metric is plotted and which baselines are compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating where we agree that revisions are needed and outlining the specific changes we will make.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Experiments): the claim that FaithTrace 'yields more faithful explanations than baselines' is presented without any description of datasets, baselines, statistical significance tests, or controls. This information is load-bearing for the central empirical claim and must be supplied to allow verification.

    Authors: We agree that the abstract's high-level claim would benefit from more immediate context. While Section 5 of the manuscript already details the datasets (ImageNet and COCO subsets), the baselines (including concept activation vectors and other zero-shot textual methods), statistical significance testing (paired t-tests with reported p-values), and controls (random and shuffled explanations), we will revise the abstract to include a concise summary of the evaluation protocol. We will also add an explicit statement in the opening of Section 5 reiterating these elements for readers who focus on the empirical claims. This revision will be made. revision: yes

  2. Referee: [§3] §3 (Method, influence-score definition): the directional derivative is used both to rank candidate explanations and to construct the quantitative faithfulness metrics. Because the same first-order quantity serves as both the ranking criterion and the evaluation target, an independent validation (e.g., against human judgments or perturbation-based ground truth) is required to rule out circularity in the faithfulness assessment.

    Authors: We acknowledge the potential for circularity when the same directional-derivative quantity is used both to select explanations and to define the faithfulness metrics. To address this directly, we will add a new subsection to the experiments that reports independent validation: (1) correlation of the influence-based scores with human faithfulness ratings collected on a held-out subset of 200 image-explanation pairs, and (2) perturbation experiments that mask image regions corresponding to the textual concepts and measure the resulting change in class logit. These results will be presented alongside the existing metrics to show alignment with external signals. This revision will be made. revision: yes

Circularity Check

1 steps flagged

Influence score used both to select explanations and to define the faithfulness metrics that evaluate them

specific steps
  1. self definitional [Abstract]
    "we introduce an influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space, and use it as a proxy for faithfulness. Moreover, we extend this influence score into quantitative evaluation metrics"

    The influence score is simultaneously the quantity maximized by FaithTrace to produce explanations and the basis for the quantitative faithfulness metrics used to demonstrate superiority. Consequently the reported improvement in faithfulness is equivalent to the selection criterion by definition.

full rationale

The paper defines an influence score via directional derivative and explicitly uses it as the proxy for faithfulness while extending the identical score into the quantitative evaluation metrics. Because FaithTrace selects or ranks textual explanations according to this same influence score, any claim that it produces higher-scoring (i.e., more faithful) explanations on the derived metrics follows by construction rather than from independent validation. This matches the self-definitional pattern: the central empirical result reduces to the definition of the metric itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based solely on abstract; full paper would be needed to identify all free parameters, axioms, and entities. The influence score itself is introduced as a new construct without independent evidence outside the method.

axioms (1)
  • domain assumption The directional derivative along the text-induced direction accurately reflects the influence of the described concept on the prediction.
    This premise underpins the use of the derivative as a faithfulness proxy.
invented entities (1)
  • FaithTrace influence score no independent evidence
    purpose: Proxy for faithfulness of textual explanations
    Newly defined quantity based on directional derivative.

pith-pipeline@v0.9.0 · 5700 in / 1256 out tokens · 42884 ms · 2026-05-19T20:16:08.991690+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 2 internal anchors

  1. [1]

    08774, 2023 1, 2

    Achiam, J., et al.: Gpt-4 technical report, arXiv: 2303. 08774, 2023 1, 2

  2. [2]

    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

    Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P ., Lin , J., Zhou, C., Zhou, J.: Qwen-vl: A versatile vision-languag e model for understanding, localization, text reading, and b e- yond. arXiv preprint arXiv:2308.12966 (2023) 4

  3. [3]

    In: Advances in Neural Information Processing Sys- tems (NeurIPS) (2024) 2

    Balasubramanian, S., Basu, S., Feizi, S.: Decomposing a nd interpreting image representations via text in vits beyond CLIP. In: Advances in Neural Information Processing Sys- tems (NeurIPS) (2024) 2

  4. [4]

    In: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR)

    Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Ne t- work Dissection: Quantifying Interpretability of Deep Vi- sual Representations . In: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 3319– 3327 (2017) 2, 1

  5. [5]

    In: Proceedings of t he IEEE International Conference on Computer Vision (ICCV) (2021) 6

    Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P ., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of t he IEEE International Conference on Computer Vision (ICCV) (2021) 6

  6. [6]

    CoRR (2023) 1

    Dani, M., Rio-Torto, I., Alaniz, S., Akata, Z.: Devil: De cod- ing vision features into language. CoRR (2023) 1

  7. [7]

    I n: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR)

    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. I n: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR). pp. 248–255 (2009) 6

  8. [8]

    In: The International Conference on Learning Rep- resentations (ICLR) (2021) 6

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenbor n, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: The International Conference on Learning Rep- resentations (ICLR) (2021) 6

  9. [9]

    In: Advances in Neural Information Process- ing Systems (NeurIPS) (2023) 2, 8, 9

    FEL, T., Boissin, T., Boutin, V ., Picard, A.M., Novello, P ., Colin, J., Linsley, D., ROUSSEAU, T., Cadene, R., Goetschalckx, L., Gardes, L., Serre, T.: Unlocking feature visualization for deep network with MAgnitude constrained optimization. In: Advances in Neural Information Process- ing Systems (NeurIPS) (2023) 2, 8, 9

  10. [10]

    In: The International Conference on Learning Representa- tions (ICLR) (2024) 2

    Gandelsman, Y ., Efros, A.A., Steinhardt, J.: Interpre ting CLIP’s image representation via text-based decomposition . In: The International Conference on Learning Representa- tions (ICLR) (2024) 2

  11. [11]

    Gorgun, A., Schiele, B., Fischer, J.: Vital: More un- derstandable feature visualization through distribu- tion alignment and relevant information flow (2025), https://arxiv.org/abs/2503.22399 2, 9

  12. [12]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learni ng for Image Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016) 6

  13. [13]

    In: Leibe, B., Matas, J., Sebe, N., Welling, M

    Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Pro- ceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2016) 2, 1

  14. [14]

    In: Proceedings of the International Conference on Machine Learning (ICML)

    Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Vi´ egas, F.B., Sayres, R.: Interpretability beyond feature attri- bution: Quantitative testing with concept activation vect ors (tcav). In: Proceedings of the International Conference on Machine Learning (ICML). vol. 80, pp. 2673–2682 (2018) 2, 1

  15. [15]

    In: Pro- ceedings of the International Conference on Machine Learn- ing (ICML) (2020) 1, 2

    Koh, P .W., Nguyen, T., Tang, Y .S., Mussmann, S., Pierso n, E., Kim, B., Liang, P .: Concept bottleneck models. In: Pro- ceedings of the International Conference on Machine Learn- ing (ICML) (2020) 1, 2

  16. [16]

    In: Proceedings of the 39th In- ternational Conference on Machine Learning

    Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: Bootstrapping language-image pre-training for unified vision-language u n- derstanding and generation. In: Proceedings of the 39th In- ternational Conference on Machine Learning. vol. 162, pp. 12888–12900 (2022) 1, 2

  17. [17]

    In: Advances in Neural Information Processing Sys- tems

    Liu, H., Li, C., Wu, Q., Lee, Y .J.: Visual instruction tu n- ing. In: Advances in Neural Information Processing Sys- tems. vol. 36, pp. 34892–34916 (2023) 1, 2

  18. [18]

    In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

    Liu, Y ., Zhang, T., Gu, S.: Hybrid concept bottleneck mo d- els. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 20179– 20189 (June 2025) 1

  19. [19]

    Menon, S., V ondrick, C.: Visual classification via desc ription from large language models (2023) 2

  20. [20]

    In: Proceed- ings of the International Conference on Machine Learning (ICML) (2023) 1, 2, 3, 6

    Moayeri, M., Rezaei, K., Sanjabi, M., Feizi, S.: Text-t o- concept (and back) via cross-model alignment. In: Proceed- ings of the International Conference on Machine Learning (ICML) (2023) 1, 2, 3, 6

  21. [21]

    In: Proceedings o f the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Nguyen, A., Clune, J., Bengio, Y ., Dosovitskiy, A., Y os in- ski, J.: Plug & play generative networks: Conditional itera - tive generation of images in latent space. In: Proceedings o f the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3510–3520 (2017) 9

  22. [22]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Nguyen, A., Dosovitskiy, A., Y osinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural net - works via deep generator networks. In: Advances in Neural Information Processing Systems (NeurIPS). p. 3395–3403 (2016) 9

  23. [23]

    In: The International Con- ference on Learning Representations (ICLR) (2023) 1, 2

    Oikarinen, T., Das, S., Nguyen, L.M., Weng, T.W.: Label - free concept bottleneck models. In: The International Con- ference on Learning Representations (ICLR) (2023) 1, 2

  24. [24]

    Distill (2017) 2, 9

    Olah, C., Schubert, L., Mordvintsev, A.: Feature visua liza- tion. Distill (2017) 2, 9

  25. [25]

    https://platform.openai.com/docs/models/gpt-3-5 (2025), accessed 2025-10-13 4

    OpenAI: Gpt-3.5 turbo models. https://platform.openai.com/docs/models/gpt-3-5 (2025), accessed 2025-10-13 4

  26. [26]

    In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (June 2018) 2, 1

    Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., Rohrbach, M.: Multimodal expla- nations: Justifying decisions and pointing to the evidence . In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (June 2018) 2, 1

  27. [27]

    , Agarwal, S., Sastry, G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual mod- els from natural language supervision

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G. , Agarwal, S., Sastry, G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual mod- els from natural language supervision. In: Proceedings of t he International Conference on Machine Learning (ICML). pp. 8748–8763 (2021) 1, 3 10

  28. [28]

    OpenAI (2019) 1

    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI (2019) 1

  29. [29]

    In: Pattern Recognition

    Salewski, L., Koepke, A.S., Lensch, H.P .A., Akata, Z.: Zero- shot translation of attention patterns in vqa models to natu - ral language. In: Pattern Recognition. pp. 378–393. Cham (2024) 2

  30. [30]

    In: Proce ed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops

    Sammani, F., Deligiannis, N.: Uni-nlx: Unifying textu al ex- planations for vision and vision-language tasks. In: Proce ed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops. pp. 4634–4639 (October 2023) 1, 2

  31. [31]

    In: The International Conference on Learnin g Representations (ICLR) (2025) 1, 2

    Sammani, F., Deligiannis, N.: Zero-shot natural langu age explanations. In: The International Conference on Learnin g Representations (ICLR) (2025) 1, 2

  32. [32]

    In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

    Sammani, F., Mukherjee, T., Deligiannis, N.: Nlx-gpt: A model for natural language explanations in vision and vision- language tasks. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 8322–8332 (June 2022) 1, 2

  33. [33]

    In: Pr o- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Shang, C., Zhou, S., Zhang, H., Ni, X., Yang, Y ., Wang, Y .: Incremental residual concept bottleneck models. In: Pr o- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11030–11040 (June 2024) 1

  34. [34]

    In: Gurevych, I., Miyao, Y

    Sharma, P ., Ding, N., Goodman, S., Soricut, R.: Concep- tual captions: A cleaned, hypernymed, image alt-text datas et for automatic image captioning. In: Gurevych, I., Miyao, Y . (eds.) Proceedings of the 56th Annual Meeting of the Asso- ciation for Computational Linguistics (V olume 1: Long Pa- pers). pp. 2556–2565 (Jul 2018) 1

  35. [35]

    Shtedritski, A., Rupprecht, C., V edaldi, A.: What does clip know about a red circle? visual prompt engineering for vlms (2023) 2

  36. [36]

    Team, Q.: Qwen2.5-vl (January 2025), https://qwenlm.github.io/blog/qwen2.5-vl/ 4

  37. [37]

    In: Proceedin gs of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)

    Wang, B., Li, L., Nakashima, Y ., Nagahara, H.: Learning bottleneck concepts in image classification. In: Proceedin gs of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 10962–10971 (June 2023) 1

  38. [38]

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    Wang, P ., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Che n, K., Liu, X., Wang, J., Ge, W., Fan, Y ., Dang, K., Du, M., Ren, X., Men, R., Liu, D., Zhou, C., Zhou, J., Lin, J.: Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191 (2024) 4

  39. [39]

    In: Findings of the Association for Computational Linguistics: EMNLP 2024

    Wojciechowski, A., Lango, M., Dusek, O.: Faithful and plausible natural language explanations for image classifi ca- tion: A pipeline approach. In: Findings of the Association for Computational Linguistics: EMNLP 2024. pp. 2340–

  40. [40]

    Association for Computational Linguistics (Nov 2024) 3

  41. [41]

    Yamauchi, T., Kera, H., Kawamoto, K.: Zero-shot textua l explanations via translating decision-critical features (2025) 1, 2, 4, 6

  42. [42]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Yang, Y ., Panagopoulou, A., Zhou, S., Jin, D., Callison - Burch, C., Yatskar, M.: Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19187–19197 (June 2023) 2, 1 11 Zero-Shot Faithful Textual Explanations v...

  43. [43]

    Generate GENERAL concepts that can apply to many different photos of the same object type

  44. [46]

    DO NOT include class names or object names directly. Q: What are useful visual features for distinguishing a lemur in a photo? A: There are several useful visual features to tell there is a lemur in a photo: - long tail - large eyes - gray fur - trees - branches - forest Q: What are useful features for distinguishing a {class_name} in a photo? Already gen...

  45. [47]

    Generate DETAILED and SPECIFIC concepts that can apply to this image

  46. [48]

    Include both OBJECT features (e.g., shape, color, parts) AND CONTEXT features (e.g., background, environment, setting)

  47. [49]

    Keep concepts short and specific (1-3 words)

  48. [50]

    Examples: Q: Look at this image carefully

    DO NOT include class names or object names directly. Examples: Q: Look at this image carefully. Based on what you can actually see in the image, identify useful visual 2 features that help distinguish this as a koi fish. A: There are several useful visual features to tell there is a koi fish in a photo: - bright orange scales - curved tail fin - spotted p...